De-blackboxing “the Cloud” and the Principle of Scalability

Abstract

The abstractness of the term “the cloud” has left many unknowns in a technology that has been rapidly evolving and present in most computational and technological advancements that we use on a regular basis. The nature and characteristics of the cloud create a mystery behind the systems and infrastructure both computational and physical that accompany cloud computing. By de-blackboxing and navigating through the main features, characteristics and concepts of cloud computing, an emphasis is placed on the understanding that the vast production of data can also lead to the overuse of data centers and physical concepts that ultimately have an impact on the environment.

Introduction

 Figure 1. via GIPHY 

Cloud computing has been an expanding phenomenon and been put to great use over the past decade. From personal use, to businesses, educational institutions, governmental institutions and even health care establishments, rely on the efficiencies, safety and operability of “the cloud” for day-to-day functions and operations. The effectiveness and performance of the cloud slowly became adopted by anyone with a smart device as big tech and software companies not only use cloud computing technology in their products but are also the ones who create it, develop it and hold major decisions over it. With the rapid evolution of technology, more and more data is being constantly transferred, saved, uploaded, downloaded and more, in such large amounts that only powerful “high-performance computing” systems such as cloud computing, can “handle the big data problems we have today in reasonable time” (Alpaydin,2017, p. 176). The concept of being able to access your data from a non-specific location, without having to use a specific device, or carry a floppy disk or USB-stick, was not fathomable a few decades ago. The idea of an “invisible” cloud where everything and anything can be manipulated, stored and re-distributed made peoples’ fast-paced lives even more accommodating. Of course, it’s not just personal use that comes into play, but also businesses, companies and large corporations do not have to invest in thousands of computes, maintenance and support staff nor their own data servers and space, since someone else can provide that service to them (De Bruin & Floridi, 2017; Bojanova et al., 2013). An intangible, invisible “cloud”. Or is it? To what extent is it as abstract as most people think it is? De-black boxing cloud computing or “the cloud”, is critical towards understanding its implications both virtually and in the real, physical world. This piece further investigates how and to what extent does cloud computing use and consumption, affect the physical implications and infrastructures in terms of their environmental impact.

What is “The Cloud”?

One of the biggest cloud computing management companies Amazon’s Amazon Web Services (AWS) defines cloud computing as “the on-demand delivery of IT resources via the Internet” that provides access to any tech services on a on an as-needed basis (AWS, 2019). Among the plethora of things that cloud computing can be and is used for some are: “data backup, disaster recovery, email services, sharing virtual desktops, big data analytics, customer-facing web applications. It can also be used for personalized treatment of patents, fraud detection for finance companies, provide online games for millions of people/players around the world and more” (Theocharaki, 2021).

The National Institute of Standards and Technology also known as NIST, define cloud computing as:

 Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model promotes availability and is composed of five essential characteristics, three service models, and four deployment models. (Ruparelia, 2016, p. 4)

Although, as a technology it is still a new and constantly evolving “phenomenon” and because of the black-box mystery that is attached to it, one can’t say that there is an exact definition but more of an overall concept of what cloud computing is and what it does. Overall, the cloud is “actually a services or group of services” (Roundtree et al., 2014, p. 1), where a collection of technologies or a set of common technologies work together, so that the data and computation attributes are handled in large and remote off-site data centers. According to the NIST, cloud computing can be distinguished by three main components; key cloud characteristics, cloud deployment models and cloud service models (Alpaydin, 2017; Roundtree et al., 2014). Behind this “modern buzzphrase” of cloud computing, “hides a rich tradition of information sharing and distributed computing” (Denning & Martell, 2015) whose vast unknown of what took place behind the border of the box gave it its famous name; “the Cloud”.

History of The Cloud

 Figure 2. Project MAC’s, IBM 7094

In the 1950s and 1960s, big companies such as IBM had already figured out a business model for cloud computing with the use of “rented computation time on large mainframe computers” and researches such as John McCarthy, who was a leading Artificial Intelligence computer scientist at Stanford, investigated the “ideas of computation as a utility function (De Bruin & Floridi, 2017, p. 24). In the mid-1960s the Massachusetts Institute of Technology (MIT), built the Project MAC – an acronym for “multiple-access computer” or “man and computer”, which conceptualized the “idea of building systems that could share computing power among many users” (Denning & Martell, 2015, 28). Project MAC lead to the invention of Multics an early operating system that allowed memory, disk and CPU to be distributed over among many people with the incentive of sharing the cost responsibility and therefore lowering the price of individual payment.

Figure 3. The H6180 Multics at MIT

The supply of the computing power would be used as a utility, a commodity that anyone could use.  Towards the end of the decade, ARPANET (The Advanced Research Projects Agency Network) followed the essence of utility; resource sharing and wide accessibility and as long as you were connected to the network you could connect with any host and therefore service(s). This soon evolved in what we now know as TCP/IP protocols, which official set and standardized in 1983 by APRANET. TCP/IP protocols allowed for message exchange without having to know someone’s actual location but just IP addresses, it was based on open standards that could be used in open-source software (Denning & Martell, 2015; Irvine 2021; Nelson, 2009). After adopting the Domain Naming System (DNS) a year later, host names now had their personalized numeric IP addresses (xxx.xxx.x.xx) creating even more flexibility between communications and location of internet matter (Denning & Martell, 2015).

By the 1990s when the World Wide Web was taking over, just as Cloud Computing started gaining more fame in the early 2000s, the de-blackboxing of such types of computing and the knowledge behind their functionalities, paved the way of how they were to be understood by the general public. The presence of the WWW, created further transparency and ‘manipulation’ of information objects across networks and the Internet especially after the appearance and creation of Uniform Resource Locators (URLs) and the Digital Object Identifier (DOI) system, that gave unique identifiers and ‘names’ to ‘things’ on the Internet creating unique digital objects (Nelson, 2009; Denning & Martell, 2015).

The client-server architecture that is used by most web services in the cloud even today, can be attributed to MIT’s Multics, which developed the idea of sharing resources from a mainframe system for multiple users, Xerox Palo Alto Research Center’s system “Alto”, a network of independent graphic workstations that were all connected together on an Ethernet, and another MIT creation the ‘X-Window’ client-server system, that basically granted pre-established client-server communication protocols, allowing new service providers to user their own hardware and user interfaces without the extra hassle of designing new protocols (Denning & Martell, 2015, 30).

 Figure 4. Xerox PARC’s Alto system

 With the creation of more and different forms and products in tech, such as PCs, tablets, smart phones, email services etc. cloud computing gained huge interest as it managed to adapt and support these ‘expansions’. In 2006, Google’s then CEO – Eric Schmidt, popularized the term to what most people now refer to as “The Cloud” and has become a part of pretty much anything we do that is related to technology in one way or another (DeBruin & Floridi, 2017, p.23-24).

Architecture & Functionality

Almost everything we do or use in terms of our day-to-day technology is in one way or another a part or a process of cloud computing. From our email services, to video streaming services such as Netflix or YouTube, to smart phone speech recognition to sharing your files on Google Drive, uploading them on Dropbox, sending photos, doing online school on Zoom, or working with ten other people on the same project at the same time on a specific platform, and so much more relies on “the cloud” for our daily functioning that has now become almost something we take for granted. The perplexity of the systems and processes that go on into what makes “the cloud” and the fact that it encompasses such an interconnected vastness of groups of services, frameworks, paths, etc. makes it all that more complicated to detangle and understand. Exactly because of how broad the definition or concept of “the cloud” can be, doesn’t necessarily mean that everything that is on the internet or Web-based application/product, make it a cloud application.

The five main characteristics that a service/software/system/platform needs to have in order to be considered a part of cloud computing are: on-demand self-service, broad network access, resource pooling, rapid elasticity and measured service (Ruparelia, 2016; Rountree, 2014).On-demand self-service is the idea that users/consumers have the ability to access the service on their own without the presence of a “middleman”. Broad Network Access refers to the accessibility of cloud services that only require a network connection in order to connect to the services/products/applications. The better the connection (i.e. LAN – Local Area Network connections) or a good Internet connection, the better and more efficient the services will be and should also support access from any type of device (i.e. smartphones, tablets, computers, etc.).Resource Pooling benefits the provider’s side as it implies that since the customers/users will not always have the need to simultaneously use all the resource available to them at the same time, the ones that are not being put to use can benefit and be used by another customer/user. This in a sense saves resources and allows providers to service more customers than they would’ve if all resources were constantly having ‘to work’ for one user, even though they were not being used. 

Rapid Elasticity entails the ability of the cloud, to grow to the user’s demand and satisfaction. Through automation and orchestration, when the resources have been used to their full capacity, the system will automatically seek to gain more capacity expansion. On the customer’s end this looks like this unimaginable space in the cloud, but in reality, for the providers this means that the more space is wanted the more physical resources need to be implemented such as computer resources, hard disks, servers, etc. However, the key to this is that the resources in demand is that in order for the providers to “save on consumption costs” such as power, electricity, cooling systems and more, “even though the resources are available, they are not used until needed”. Similarly, once “the usage has subsided, the capacity shrinks as needed to ensure that resources are not wasted” (Roundtree, 2014). Measured services are the fifth characteristics that a service/software/system/platform in order to be considered cloud computing. Measured services means having the ability for cloud service/providers to measure usage such as the time i.e. for how long has someone been using the service, the amount of data used i.e. how much space is it taking up, etc. This also is what determines the rates and prices of plans. If you have ever gotten notifications about running out of cloud storage on an Apple device or needing to update your cloud payment options/plan on your Google Drive, and you have payed money that has ‘magically’ increased your cloud space in “the cloud”, then it is this ‘phenomenon’, one could even say ‘luxury’, of measured services and rapid elasticity (Rountree, 2014).

Cloud Service Models

As previously mentioned, the vastness of services that can be offered from cloud computing are called “cloud service models” and are more broadly categorized into the types/kinds of services that they offer based on their target audience, responsibilities and tasks, costs, etc. The three basic service models are Infrastructure as a Service, Platform as a Service and Software as a Service.

Infrastructure as a Service also referred to as IaaS, is the service that provides “basic infrastructure services to customers” and the hardware infrastructure so both physical and virtual machines, i.e. networking, the servers, storage, plants, etc. on a utility basis. Example of this can also include; IT systems monitoring, backup and recovery, platform and web hosting, etc. (Rountree, 2014, p. 7; Ruparelia, 2016, p. 21 & 131). Some “real life” examples and applications of Dropbox with file synchronization, printing with Google Print, hosting on Amazon EC2 or HP Cloud or storage on Amazon Cloud Drive, Apple’s iCloud, etc. (Ruparelia, 2016). Platform as a Service or PaaS, “provides an operating system, development platform, and/or a database platform”. This allows and creates the ideal environment for installing and running software, developing applications by eliminating the need for a company – the client, to have to build their own infrastructure in order to develop those aps. Other “real life” examples and uses include development with languages such as C, C++ and Java, database services for business intelligence and data-warehousing. Software as a Service or SaaS, provide “application and data services” by supplying hosted applications without the need of installing and downloading them, paying extra for them or giving up space for them on your hard drive or storage disk/drive. From the application skeleton itself to all the data that comes with it, SaaS means that the cloud service/provider is responsible for maintain and keeping all platforms and infrastructure needed for the services to take place. SaaS is “the original cloud service model […] and remains the most popular” as it “offers the largest number of provider options” (Rountree, 2014, 7). It also entails use cases such as billing and invoicing, asset and content management, image rendering, email and instant messaging and more. Applications of SaaS include email services such as Gmail, Hotmail, Outlook, etc., collaborative suites such as Google Docs and Microsoft Office 365, content management such as Box and SpringCM and customer relationship management with Salesforce. (Ruparelia, 2016).

 Figure 5. Cloud Service Models diagram by David Choo

In Cloud Computing (2016), Ruparelia and a few other identify and discuss the presence of further/more specific service offerings in terms of their abstraction levels. Information as a Service (INaaS) and Business Process as a Service (BPaaS) are two of those. Information as a Service (INaaS) is responsible for providing business information that is relevant to the specific customer/client whether on an individual, business or corporate level. This may include market information, price information, stock price information, information validation, business processes and tasks, health information from patients, real-time flight information, etc. (Ruparelia, 2016, p. 177; Mosbah et al., 2013, p. 3). Business Processes as a Service (BPaaS) aids in business process outsourcing by carrying out business functions that rely on large amount of service and data that facilitate in a business’s functioning. This can include ecommerce, payroll and printing, tax computation, auditing, health pre-screening, ticketing and billing, etc. Google’s AdSense and IBM’s Blue are examples of these. (Ruparelia, 2016, p. 193; Mosbah et al., 2013, p. 3).

Cloud Deployment Models

With the wide variety of cloud computing options and services each individual, business, organization, corporation, etc. differs in what they need to use cloud services for. In order to support the environment in which personal or business use is needed or wanted, a certain kind of cloud environment must be implemented by having different service models. The four deployment models of the cloud are public, private, community and hybrid.

The public cloud service model is the most commonly thought of as all of its services, systems and operations take place in a housed external service provider. The infrastructure of the cloud is owned by the cloud service organizations who are responsible for administering and managing the provided service and can apply this across abstraction levels and available via the Internet. Some example of the public cloud model are Google Docs, Microsoft Office 365 and Amazon Cloud Player. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).The private cloud service model all the services, systems and resources are provided and located by the individual’s company’s, organization’s or person’s private cloud with zero access to the public. Private clouds can be accesses through a local (LAN), wide area network (WAN) or through a private virtual network, VPN and is managed, operated and maintained by the individual(s) in question. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).The community cloud service model is a semi-public cloud or a “broader version of a private cloud” (Ruparelia, 2016, 32) and is shared among members of a group, organization, etc. that have some sort of shared goals, missions, concerns, etc. This is specific to groups/organizations that perhaps for security and safety measures/reasons do not want to use the public cloud and theresponsibility of maintenance is shared among the members/users who have access to it. Examples of its use include a publishing cloud, a health industry cloud or a banking regulation cloud. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014). Finally, the last cloud service model is the hybrid 

 Figure 6. Representation of cloud variety by Ruparelia et al.

cloud. This entails a combination of two or more of the aforementioned cloud models that are not mixed but linked together to work more efficiently and to achieve their specific goals/operations and allow data and application portability. A hybrid cloud can consist of public and private clouds and the mixing and matching allows its users/customers more flexibility and choices in what they do and how they use their cloud services. (Ruparelia, 2016; Mosbah et al., 2013; Rountree, 2014).

 Figure 7. A great depiction of The Relationship between Services, Deployment Models, and Providers by Mosbah et al.

Data Centers, Principle of Scalability and Cloud Computing Emissions

With the ambiguity that accompanies what “the cloud” really is, this concept that after all it might really just be a cloud, an invisible mass of data, information, systems and software comes a lot of misunderstanding about its functions, operations and of course consequences. However, in order for the computational and electronic aspect of cloud computing to take place there needs to be some sort of physical support that accompanies the cloud products and services, in general the overall system. With the mass production, circulation, consumptions, manipulation, etc. of data in unquantifiable amounts, technological challenges can come into play. Scaling out is a main concern of cloud computing that is getting more and more attention and being further addressed not only by people in the tech or science field but also those in the natural and environmental scientists and even pop-culture. The environment of cloud infrastructure, entails and relies on commodity equipment which means that in order to “add capacity, you need to scale out instead of scaling up” (Rountree, 2014, 16). Scaling out can lead to extra pressure and burden for datacenter and facilities that host the cloud’s infrastructure and “increased environment-related costs in resources such as power and cooling” (Rountree, 2014, 16) amongst a variety of other things.

Data centers are physical locations/sites/areas/ spaces, the true “home” of cloud computing” where all the countless of servers and processors are housed. Data centers are spread out in all different areas and cities, remote or otherwise, in the U.S. and all over the world. The various data centers can communicate and collaborate with each other through a network through which “tasks are automatically distributed and migrated from one to the other” (Alpaydin, 2017, 152).

“As cloud computing adoption increases, the energy consumption of the network and of the computing resources that underpin the cloud is growing and causing the emission of enormous quantities of CO2”, explains Gattulli et al., in their research on cloud computing emissions (Gattulli et al., 2014).  In the past decade alone, “data center energy usage has decoupled from growth in IT workloads” with public cloud vendors, also being among the biggest (tech) companies in the world, deploying large amounts of new cloud systems and networks leaving an environmental impact that is often times harder to asses because of the nature of this technology, than it is to calculate other sort of emissions (Mytton, 2020). “Although the large cloud vendors are amongst the largest purchasers of renewable electricity, customers do not have access to the data they need to complete emissions assessments under the Greenhouse Gas Protocol” leading the way for scientist and researchers such as Gattulli and Mytton, to find new ways and methods to control IT emissions and lessen the environmental impact that our overreliance on the efficiency of this technology has on our planet. Over the past 5 or so years, the Information and Communication Technology’s carbon emissions alone have amounted to 1.4% – 2% of total global greenhouse gas emissions, “approximately 730 million tones CO2 equivalent (Mt CO2-eq)” (Ericsson, 2021; Gattulli et al., 2014). Data centers that are used for public internet alone consumed 110TWh in 2015, almost 1% of the world’s electricity consumption (Ericsson, 2021). Often, we do not think of all the daily services and products we use that ultimately rely on the cloud for their functions, such as video streaming platforms, gaming, overall uses of AI and Machine Learning, cryptocurrencies, etc. In 2017 for example, Justin Bieber’s song “Despacito”, “consumed as much electricity as Chad, Guinea‑Bissau, Somalia, Sierra Leone and the Central African Republic put together in a single year” through streams and downloads (five billion) and Bitcoin mining “accounted for 0.2 percent of global electricity usage in mid-2018” (Ericsson, 2021).

 Figure 8. Representation of the Carbon footprint of ICT and data traffic development by Ericsson
 Figure 9. Distribution of ICT’s carbon footprint in 2015 by Ericsson

Conclusion

The technological evolutions of the past decades have led to the amazing invention of cloud computing. The “explosive growth of data and the need for this data to be securely stored yet accessible anywhere, anytime” lead to a higher demand and even need of cloud computing (Bojanova et al., 2013).  Of course, this has created a circle of constant data and data services being constantly re-born and re-distributed in the broad network and cloud. The mystery behind what cloud computing and “the cloud” is, doesn’t necessarily help with understanding and conceptualizing the physical and material aspect of this technology. Therefore, this further instigates the hidden implications that come along with disregarding the fact that cloud computing isn’t so much in “the cloud” but on physical location on earth that keep getting larger and more with the exponential increase of cloud computing services demand. As it happens, data centers that hold and are the backbone of cloud computing, as well as all the other external ‘expenditures’ such as electricity, maintenance, etc. have much heavier implications on the environment than we assume from a conceptually intangible technological advancement. Recent research and environmental analysis, support the idea that low-carbon cloud-computing solutions, renewable energy sources, as well as gaining access to data about cloud computing emissions and power usage effectiveness can increase awareness and understanding of what is going on behind the scenes of this technology that we truly hold so dear to us (Mytton, 2020; Gattulli et al., 2013; Ericsson, 2021).

https://www.youtube.com/watch?v=H_l1WVZkRX0

Bibliography 

Alpaydin, Ethem. (2016). Machine Learning: The New AI. Cambridge, MA: The MIT Press.

Amazon’s Amazon Web Services 

Bojanova, I., Zhang, J., and Voas, J. (2013).  “Cloud Computing,” in IT Professional, vol. 15, no. 2, pp. 12-14, doi: 10.1109/MITP.2013.26.

De Bruin, Boudewijn and Floridi, Luciano. (2017). The Ethics of Cloud ComputingScience and Engineering Ethics vol. 23, no. 1 (February 1, 2017): 21–39.

Denning, Peter J.  and Martell, Craig H.. (2015). Great Principles of Computing. Cambridge, MA: The MIT Press. 

Ericsson. (2021). ICT and the Climate. Ericson. https://www.ericsson.com/4907a4/assets/local/reports-papers/consumerlab/reports/2020/ericsson-true-or-false-report-screen.pdf

Gattulli, M., Tornatore, M., Fiandra, R., and Pattavina, A. (2014). “Low-Emissions Routing for Cloud Computing in IP-over-WDM Networks with Data Centers,” in IEEE Journal on Selected Areas in Communications, vol. 32, no. 1, pp. 28-38, doi: 10.1109/JSAC.2014.140104.

Irvine, M. (2021) What is Cloud Computing? AI/ML Applications Now Part of Cloud Services. Class notes: https://irvine.georgetown.domains/607/

Mosbah, Mohamed Magdy, Soliman, Hany and El-Nasr Mohamad Abou. (2013). Current Services in Cloud Computing: A Survey. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3,No.5 DOI : 10.5121/ijcseit.2013.3501

Mytton, D. (2020). Assessing the suitability of the greenhouse gas protocol for calculation of emissions from public cloud computing workloads. Journal of Cloud Computing, 9(1) doi:http://dx.doi.org.proxy.library.georgetown.edu/10.1186/s13677-020-00185-8

Nelson, M. (2009). Building an Open Cloud. Science, 324(5935) from http://www.jstor.org/stable/20536490

Ruparelia, Nayan B. (2016). Cloud Computing. Cambridge, MA: MIT Press, 2016. 

Roundtree, Derrick and Castrillo, Illeana.(2014)The Basics of Cloud Computing: Understanding the Fundamentals of Cloud Computing in Theory and Practice. Amsterdam; Boston: Syngress / Elsevier.

Theocharaki, D. (2021). Cloud Monopoly. Class notes: https://blogs.commons.georgetown.edu/cctp-607-spring2021/2021/04/08/cloud-monopoly/

Figure 1: GIF from Giphy 

Figure 2: Photo of Project MAC’s, IBM 7094 from Multicians

Figure 3: Photo of H6180 Multics at MIT from http://gunkies.org/wiki/Multics 

Figure 4: Photo of Xerox PARC’s Alto system from Wired article “The 1970s Conference That Predicted the Future of Work” by Leslie Berlin 

Figure 5: Photo of Cloud Service Models diagram by David Choo

Figure 6: Screenshot from Ruparelia, Nayan B. (2016). Cloud Computing. Cambridge, MA: MIT Press, 2016. 

Figure 7: The Relationship between Services, Deployment Models, and Providers by Mosbah, Mohamed Magdy, Soliman, Hany and El-Nasr Mohamad Abou. (2013). Current Services in Cloud Computing: A Survey. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.3,No.5 DOI : 10.5121/ijcseit.2013.3501

Figure 8: Representation of the Carbon footprint of ICT and data traffic development from Ericsson. (2021). ICT and the Climate. Ericson. https://www.ericsson.com/4907a4/assets/local/reports-papers/consumerlab/reports/2020/ericsson-true-or-false-report-screen.pdf

Figure 9: Distribution of ICT’s carbon footprint in 2015 from Ericson. https://www.ericsson.com/4907a4/assets/local/reports-papers/consumerlab/reports/2020/ericsson-true-or-false-report-screen.pdf