Friday 13 February 2015

Big Data In The Cloud

ETL and DWH are my fav and still that is my full time job, but I am just loving to work on some technical POC's of Hadoop and Big Data since last year.

Data is becoming more valuable, its really true. So the demand and value for the Big Data professionals in the market is high these days. That is the reason companies and products based on Hadoop are taking high positions.

As per my knowledge, today Cloudera is leading the technology quadrant.

Recently I was reading a paper published by Intel IT center, and found the information worth to share.

The paper describes how cloud and big data technologies are converging to offer a cost-effective delivery model for cloud-based big data analytics. It also includes: How cloud computing is an enabler for advanced analytics with big data. How IT can assume leadership for cloud-based big data analytics in the enterprise by becoming a broker of cloud services. Analytics-as-a-service (AaaS) models for cloud-based big data analytics. Practical next steps to get you started on your cloud-based big data analytics initiative.

The paper explains about data analytics is moving from batch to real time
 and real time supports predictive analytics.
you can read more on this here.

**************************************************************************** 

Apache Hadoop : is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures (of individual machines, or racks of machines) are commonplace and thus should be automatically handled in software by the framework.

Cloudera Inc. : is an American-based software company that provides Apache Hadoop-based software, support and services, and training to business customers.
Cloudera's open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), targets enterprise-class deployments of that technology. 

***********************************************************

Wednesday 31 July 2013

Gartner's Hype Cycle Explained


GARTNER - The best research and advisory company I found till date....

Gartner, Inc. (NYSE: IT) is the world's leading information technology research and advisory company.

Gartner's Hype Cycles help technology planners to decide when to invest in that technology. A Hype Cycle is a useful educational tool that:

· Establishes the expectation that most technologies will inevitably progress through the pattern of over-enthusiasm and disillusionment before proving their real value.

· Provides a snapshot of the relative level and pace of maturity of technologies within a certain segment of the IT world, such as a technology area, horizontal or vertical business market, or a certain demographic audience.

· Has a simple and clear message: companies should not invest in a technology just because it is being hyped, nor should they ignore a technology just because it is not living up to early over-expectations.

The Hype Cycle Graphic




Gartner's Hype Curve Breakdown (Source: Gartner)

Lets take some data of 2009 given by Gartner.....we know the current scenario and can see the accuracy of the predictions of this prestigious firm.

Gartner's Hype Cycle for Emerging Technologies in 2009

Just have a look on history that what Gartner said about cloud computing in 2009....

Gartner's Hype Cycle for Cloud Computing in 2009



Gartner's Priority Matrix for Cloud Computing, 2009

The data above says its story and I don't need to explain anymore for our trust on Gartner's analysis and predictions :) :) :)

Monday 29 July 2013

Amazon EC2...Amazing Cloud....

Amazon EC2’s simple web service interface allows enterprises to obtain and configure capacity with minimal friction. It provides enterprises with complete control of their computing resources that run on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing enterprises to quickly scale capacity, both up and down, as the enterprise's computing requirements change. Amazon EC2 changes the economics of computing by allowing enterprises to pay only for capacity actually used. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios.

On-Demand Instances Pricing



Amazon EC2 On-DEmand Instances Pricing


Reserved Instances Pricing



Amazon EC2 Reserved Instances Pricing

Spot Instances Pricing

Amazon EC2 Spot Instances Pricing
Spot Instances pricing fluctuates periodically depending on the supply of and demand for Spot Instance capacity. The illustration below takes a snapshot pricing for the EU Region at Wednesday January 13 10:28:06 UTC 2010.

Internet Data Transfer Pricing

The pricing below is based on data transferred "in" and "out" of Amazon EC2.

There is no Data Transfer charge between Amazon EC2 and other Amazon Web Services within the same region (i.e. between Amazon EC2 US West and Amazon S3 in US West). Data transferred between Amazon EC2 instances located in different Availability Zones in the same Region will be charged Regional Data Transfer. Data transferred between AWS services in different regions will be charged as Internet Data Transfer on both sides of the transfer.

Amazon Internet Data Transfer Pricing



Amazon Elastic Block Storage (EBS) Pricing


AWS Import/Export Service

AWS now offers physical data import/export service makes it easy to quickly transfer large amounts of data into and out of the AWS Cloud. It is an economical alternative to sending large volumes of data across the Internet. The AWS Import/Export service allow 2TB of data to be imported or exported globally from AWS S3. With that service, customers can send Amazon a blank storage device and Amazon will copy the contents of one or more Amazon S3 buckets to it before shipping it back. Alternatively, customers can send Amazon a storage device full of data that Amazon will copy it to the S3 buckets of the customer's choice. Customers can use AWS Import/Export for:

· Data Migration
· Offsite Backup
· Direct Data Interchange
· Disaster Recovery

Saturday 27 July 2013

FUTURE RESEARCH IMPLICATIONS

An interesting future research objective would be to revisit this CBA when enforceable environmental laws applicable to the ICT sectors are enacted. A change in the European legislative landscape including the Carbon Trading Scheme, and the introduction of effective tax incentives for those enterprises that comply with the EC Code of Conduct requirements, will affect the result of the financial analysis with respect to the quantification and monetary valuation of the environmental benefits. I think it is important to keep an eye on the enactment of similar environmental laws in the US and in emerging countries like India and China because these fast-growing economies are concerning prospects of GHG emission increases. To echo Greenpeace's concerns about cloud computing's possible negative impact on the environment, it may prove of capital importance to dig further into the issue of how big the cloud really is when it comes to electricity consumption and GHG emissions and how big it will become given its rapid growth and given that many major cloud brands refuse to disclose their energy footprint.

Another issue worth investigating further concerns the extent to which European economies are becoming increasingly dependable upon US-centric firms like Google and Microsoft for the procurement of computing resources when the cloud as a utility computing grid becomes ubiquitous.

A corollary business sustainability issue related to the widespread use of cloud computing for the firm's business processes resides in the diffuse control of the Internet as the broadband conduit linking datacenters together, and the relative fragility of its architecture. Lawrence G. Roberts, one of the founders of the Internet, says, in an address to the IEEE organization, that the Internet is broken, and that network routers are too slow, costly, and power hungry (Roberts 2009). Today's Internet traffic is rapidly expanding and also becoming more varied and complex in particular due to an explosion in voice and video traffic. The shift is not without causing problems, he says, even though everybody is using Skype or YouTube today without too much of a hitch, because the packet switching technology at the heart of the Internet's TCP/IP protocol was not designed for that type of application. Packet switching routers around the world are becoming increasingly congested, causing quality of service deteriorations. This may not be perceivable today because the Internet has been grossly over-provisioned by network operators who have deployed mountains of optical fibers during the dot-com era, but at the current rate of growth, cloud computing combined with the massive arrival of the iPad, iPhone, netbooks and other tablet computers, may put the viability of the Internet at risk. The resulting effects would be devastating for those enterprises who rely heavily on cloud computing to perform their business operations.

Tuesday 9 July 2013

My Thinking Till Now For Cost Benefit Analysis (CBA)

The CBA of the migration project for the software development and test activities at GEC to the AWS cloud shows positive financial results. However, it was not possible to demonstrate that the assumed environmental benefits of cloud computing played a sensitive role there. By migrating parts of the computing resources of the datacenter to the AWS cloud, the financial analysis demonstrated that GEC could achieve significant cost savings in areas of hardware equipment costs, electricity consumption costs for the servers' power and cooling, as well as in user productivity gained from the better effectiveness of the hybrid cloud solution. The financial analysis shows that GEC could obtain a risk-adjusted return on investment (ROI) of 117%, with a payback period of 9 months, by migrating its software R&D's development and test activities to the AWS cloud. However, the initial environmental benefits assumption about cloud computing―resulting from a higher computing efficiency―could not be objectively quantified in the analysis. Failure do to so, can be explained through two main reasons:

Firstly, it is not argued that cloud computing can save billions of kW-hours in energy consumption because cloud providers can squeeze the performance and efficiency of their infrastructures at much higher levels than private datacenters, especially when compared to those of small firms of limited innovation and cash resources. But while the energy efficiency benefits of cloud computing are generally not contested, claiming that cloud computing is a green technology is a totally different story, as reported by a number of ICT practitioners and ONGs like Greenpeace. Despite the fact that some cloud providers are reaching extremely low PUEs, and are also looking to build massive datacenters in places so as to maximize energy efficiency and harness renewable or clean energy, the primary motivation is cost containment, which doesn't necessarily meet environmental and social responsibility objectives. The study showed that while energy efficiency reduces the energy consumption footprint, it is not green if cloud providers are simply looking at maximizing output from the cheapest and dirtiest source of energy available, such as Microsoft's Chicago cloud who supplies power to its datacenter from a coal-burning electricity grid.

Secondly, the current body of environmental legislations that are enacted by governments and regulatory organizations that apply to the ICT sectors are not to a large extent quantifiable in financial terms. This observation I think is coherent with the findings of this study and coherent with the common perception that the economics of green IT are stimulated primarily by the concern of cutting costs in areas of energy-related expenses as well as hardware and maintenance expenses. In other words, “do the right thing for the environment” is not sufficiently rewarded by today's legislations “Energy Policies and Implication”. For example, the EU Emission Trading Scheme (ETS) that regulates the emission of greenhouse gases for the energy sector and other heavy energy consuming industries is not enforceable (yet) to the ICT industry sectors. With regard to energy policies that are of importance to the ICT industry sectors, including the EU Energy Performance of Buildings Directive, the EC Code of Conduct on Data Centers Energy Efficiency, and the Grenelle of the Environment for France, have had, so far, minor to zero financial impacts for the datacenter sector. All this may change in the future, but at the time of this writing it is the current state of business.

Saturday 6 July 2013

Total Economic Impact Methodology

In this dissertation I will apply The Total Economic Impact™ Methodology: A Foundation For Sound Technology Investments by Forrester that is described in (Gliedman 2008) (Erickson & Hughes 2004) and in (Leaver 2009). The Total Economic Impact (TEI) methodology is the product of field practitioners and industry analysts' work with Forrester. The goal of this methodology is to provide a practical and compelling framework that embraces all the critical components of quantified—as opposed to fuzzy—risk and flexibility analysis of a business case template for ICT investments.

Given the increasing sophistication that enterprises have regarding cost analysis related to ICT projects, Forrester's TEI methodology provides a complete picture of the total economic impact of an ICT project by looking at four fundamental financing decision points with associated tools and methodologies for quantification.

Benefits : the TEI methodology calculates the benefit of a technology investment decision in a given use-case scenario. TEI quantifies both tangible and intangible benefits and their dependencies over the period of analysis by identifying and calculating their positive business impacts, such as efficiency or revenue gains over the period of analysis.

Cost : TEI looks to determine the cost of investing in a new initiative, application, or technology by analyzing the change to ICT and business operations caused by the new technology investment compared with the cost of maintaining the current environment over a given period that can include planning, implementation, maintenance, and the associated internal efforts and resources.

Risk : to reduce the marginal error of the estimated benefit and cost, TEI quantifies the impact of risk to establish a more realistic view of likely outcomes by tempering initial benefit estimates to compensate for environmental and technical uncertainty. The result is a risk-adjusted estimate that is most likely a more accurate predictor of the future.

Flexibility : to provide visibility into the investment life cycle, TEI values the future options that are created by the investment decision and estimates the future likely impact of ICT investments by monetizing values of future options created that often result from infrastructure, application architectures, excess capacity and similar platform investments.

The TEI quantification of benefits, cost, risk and flexibility is illustrated in the illustration below:



The Four Elements of TEI: Benefits, Cost, Risk and Flexibility for Financial Analysis (Graphic courtesy of Forrester Research, Inc.)

Benefits Measure Future Positive Impacts of the Project

The TEI methodology applies a rigorous process and best practices to improve accuracy in valuing technology benefits as described in (Gliedman 2008) and (Erickson & Hughes 2004), which consist in:

· Establishing categories of tangible benefits to quantify.

· Establishing quantifiable metrics for each benefit.

· Establishing current baselines and future projections for each metric.

· Establishing an “exchange rate” for the metric.

Friday 28 June 2013

Cloud Computing as a Green IT Strategy

Capitalizing on the advance in power of microprocessors and data storage capacity, firms like Amazon and Google are beginning to build massive and highly efficient information processing infrastructures that use the broadband Internet to reach customers. In 2008, Google was said to be operating a global network of about three dozen datacenters around the world loaded with more than 2 millions servers, although this information may be incomplete as Google is very secretive about the location of its datacenters. According to Google’s earnings reports, the company has spent $US1.9 billion on datacenters in 2006, and $US2.4 billion in 2007. Google unveiled four new datacenter projects in 2007. Each has a cost estimate of $US600 million, which will include everything from construction to equipment and computers.47 Both Microsoft and Google have extremely efficient large-scale datacenters; both companies are aiming for an industry-leading PUE of 1.12 in their computing centers (Wheeland 2009). Expanding the use of these services means more incentive to concentrate ICT operations on top-of-the-line facilities, and will continue the shift.


To exemplify the above, an article published in June 2006 by The New York Times (Markoff & Hansell 2006), unveiled Google's project to build the largest and most sophisticated datacenter on the planet near a small town on the banks of the Columbia River, named The Dalles, in North Oregon. Today, the site features three 68,680 square foot windowless warehouses designed to host hundreds of thousands of computers all working together as a single machine to deliver content over the Internet. A kind of information-processing “dynamo” of unprecedented power, comparable to a nuclear power plant for generating electricity, as stated in (Carr 2009b). Since then, The Dalles has become a symbol for the datacenter industry’s growing need for massive amounts of electric power. In its March issue, Harper magazine publishes in (Strand 2008) one Section of the official blueprints of the site plan estimating roughly that once all three server buildings will be operational in 2011, the plant can be expected to demand about 103 megawatts of electricity—enough to power 82,000 homes. The Web, the magazine says, "is no ethereal store of ideas, shimmering over our heads like the aurora borealis. It is a new heavy industry, an energy glutton that is only growing hungrier."



Google is not alone. Microsoft is also investing billions of dollars in very large computing grids, such as its datacenter in Northlake, a suburb of Chicago, which covering 500,000 square feet (46,000 square meters) and costing $US500 million, is one of the biggest, most expansive and sophisticated datacenter on the planet. The entire first floor is designed to be crammed with 200 40- foot (13 meter) each containers, loaded with up to 2,500 servers. To support Northlake's datacenter electricity needs, Microsoft has created three electricity substations that can distribute up to 200 megawatts, that is, as much as a small aluminium melter. Other Internet giants like Yahoo! are also busy building large server farms. In 2008, half a dozen were being built in Quincy in the middle of the Washington state close to the Columbia River. Other massive datacenters are being built in the UK too. For example, Rackspace has built a large datacenter on Slough Estates that will run on renewable energy and will use low-power equipment such as AMD's Opteron processor and HP's c- Class blade servers. The company has partnered with organizations such as NativeEnergy and the International Tree Foundation in the UK to enable carbon-neutral operations through offset programs.



Neither Amazon, Google nor other major providers would officially comment on their datacenters' efficiency levels. However, they argue that thanks to their large customer base, they can make large investments in efficiency innovations, which smaller firms cannot achieve on their own, leading to a continuous maximization of their infrastructure that ultimately benefits both parties. It is commonly reported that a typical PUE for a cloud-based infrastructure is around 1.2 and below, whereas the average datacenter PUE is 2.5 (Wheeland 2009). Furthermore, we see through initiatives like the EC2 Spot Instances program that maximizing the utilization rate of the datacenter is of primary concern since the worst thing for a cloud provider has to maintain an inventory of unused capacity.



Furthermore, cloud computing practices promote worker mobility, reducing the need for office space, buying new furniture, disposing of old furniture, having the office cleaned with chemicals and trash disposed of, and so on. They also reduce the need for driving to work and the resulting carbon dioxide emissions.



But while the environmental energy efficiency benefits of cloud computing are generally not contested, all the discussions about cloud computing being an effective strategy toward green IT actually miss the point, according to an inflammatory report released by Greenpeace in March 2010. This report, "Make IT Green: Cloud Computing and its Contribution to Climate Change," updates and extends some of the research published in 2008 in the Smart 202048 report on how IT contributes to climate change, and finds that the Year of the Cloud is only going to make things worse (Wheeland 2010) and (Greenpeace 2010).



The concern Greenpeace expresses in this report is that despite an increasing focus on PUE, and despite efforts to constantly make computing facilities more efficient, cloud computing is never going to make enough of a dent in greenhouse gas emissions without the involvement of constraining national and supranational regulations. This is because, despite the fact that some providers are reaching extremely low PUEs and are also looking to build their datacenters in places so as to maximize energy efficiency and harness renewable or clean energy, “it is still a tiny slice of the pie of both new and existing datacenters, and the ones that are not using renewable energy or free cooling are the biggest part of the problem”.



Greenpeace alleges in this report that while cloud computing companies are pursuing design and enforcing strategies to reduce the energy consumption of their datacenters, their primary motivation is cost containment, and that the environmental benefits of green datacenter design are generally of secondary importance. Increasing the energy efficiency of its servers and reducing the energy footprint of the infrastructure of datacenters are a must do, but efficiency by itself is not green if you are simply working to maximize output from the cheapest and dirtiest energy source available says Greenpeace in (2010). In this respect, Greenpeace lays out how dirty some of the most renowned cloud provider's biggest datacenters are:


Comparison of significant cloud providers' datacenter fueling energy mix (Graphic courtesy of Greenpeace International)

Google's Dalles facility does the best job, with 50.9 percent renewable energy from hydroelectric power. Microsoft's Chicago facility does the worse job, with 1.1% of renewable energy and 72.8% from coal-burning electricity.

But Greenpeace's concerns about cloud computing's negative environmental impact does not stop here. They argue that with “The arrival of the iPad and growth in netbooks and other tablet computers, the launch of Microsoft’s Azure cloud services for business, and the launch of the Google phone and the proliferation of mobile cloud applications are compelling signs of a movement towards cloud-based computing within the business sector and public consciousness in a way never seen before.”

So another burning question Greenpeace is posing about cloud computing is just how big the cloud really is when it comes to electricity consumption and GHG emissions and how big will it become given its rapid growth, and given that many major cloud brands refuse to disclose their energy footprint.

The Smart 2020 analysis has already forecasted that the global carbon footprint of the main components of the cloud (datacenters and the telecommunications network) would see their emissions grow, on average, 7% and 5% respectively each year between 2002 and 2020, with the number of datacenter servers growing on average 9% each year during this period. The new report brings adjustments to the Smart 2020 report forecast on the electricity demand of the global cloud, highlighting the impact of the projected IT demand and importance of where and what sources of electricity are being used to power Google, Amazon and other cloud-based computing platforms. Table 5 is projection of growth in ICT electricity consumption and GHG emissions by 2020, using a 9% annual growth rate estimated in the Smart 2020 report for datacenters and recent estimate by Gartner for growth in telecommunications of 9.5% a year.

Using the Environmental Protection Agency's Greenhouse Gas Equivalencies Calculator51, I found that 1034 million metric tons of carbon dioxide equivalents (MMTCO2Eq) represent the CO2 emissions from the electricity use of 125 million homes for one year!

Therefore, according to Greenpeace, cloud providers should build new datacenters in areas that provide cleaner energy mixes for their grid, and push regulatory bodies, in the regions where their existing datacenters are housed, to add more renewable energies to the grid.

Saturday 22 June 2013

Cloud Computing as an IT Efficiency Strategy

The Economist nails down, in a special report on corporate IT entitled “Let IT Rise”, the ascertainment of the current inefficiency state of datacenters worldwide. The Economist claims that 7,000 home-grown designed datacenters in North America alone are notoriously known for their inefficiency and, relays McKinsey and the Uptime Institute findings, that on average, only 6% of server capacity is used. Of even more concern is the assumption that nearly 30% of the servers are no longer in use at all in these datacenters, but no one bothers to remove them. It is claimed that often nobody knows which application runs on which server, and so the method used to find out is to “pull the plug and see how calls”(Siegel 2008, p.3). For years, ICT departments kept adding machines when new applications were needed, which over the years led to a situation known as server sprawl. The illustration below shows the worldwide spending of datacenters since 1996 with a projected increase estimated to $250 billion by 2011.

                                     Datacenter Worldwide Spending (Graphic Courtesy of IDC)

Prior to the economic down-turn of 2009, adding servers was not too much of an issue because entry-level servers were cheap and ever-rising electricity bills were generally charged to the company's facilities budget rather than to the ICT department's budget. But as stated by IDC, this is changing.

Cloud computing as an energy-efficient outsourcing solution deservers some attention. As such, Fleischer and Eibisch (2007) with IDC discuss the business incentives for ICT outsourcing from an increased datacenter efficiency perspective. They report that in 2007 around 50% of companies were still hosting their Web sites and e-business infrastructure internally and that this trend has been consistent over recent years. However, they believe that many companies that do in-house hosting are underestimating the total costs involved in doing so, due mainly to the rising costs of power and cooling and the gradual shift of costs such as power from facilities departments to ICT organizations. This claim is substantiated by a survey conducted by IDC in 2006 showing that 13% of companies' total datacenter operational expenditure went on electricity and that respondents expected that proportion to increase to 20% within a year.

In times of cost-cutting, where companies are striving to reduce fixed costs not directly related to their core businesses, the concern of datacenter inefficiency becomes more stringent. For this reason, IDC believes that many datacenters will be modernized and consolidated, but the cost of modernizing and refitting existing facilities is extremely high and will have a major impact on overall ICT budgets that is beyond the reach of many organizations, including primarily SMBs. Because of that, many enterprises will need to consider fitting-out new datacenter facilities in the near future. A new and more efficient datacenter that consumes less power is a greener datacenter and even more so if further consideration is given to sourcing renewable power, geographical location or reuse of the generated heat, as examples. Considering the source of power generation is also very important as it is possible to reduce a datacenter's power consumption, while still seeing an increase in carbon footprint, if the power source is switched from, say, nuclear to coal.

Numerous studies, including the one conducted by Greenspace46, an Illinois-based vendor of green building supplies, support the claim that cloud customers can save billions of kW-hours in energy consumption, and so, foster the idea that cloud computing is greener than traditional datacenters because providers are able to squeeze the performance and efficiency of their infrastructures at much higher levels of compute resource utilization than individual companies, especially small firms with fewer ICT resources. But whether Cloud Computing is a green technology or not is a totally different question, as you will read in the next post.

Monday 17 June 2013

Cloud Computing as a Strategy

The potential business benefits of Green IT along the lines of energy saving pressures should make ICT managers look at ways of increasing the efficiency of their operations. In the short term, while these issues need to be addressed, they will remain highly complex. This dilemma should accelerate the move towards the energy-efficiency value proposition of the cloud computing model that presents itself as one of the viable options to reduce much of the risk associated with a datacenter's inefficiency, especially for non-core applications such as Web applications. With regard to the future legislative landscape, it is extremely important that ICT managers begin planning and implementing a methodology to better understand their own carbon footprint and efficiency today to ensure that operations are ready once legislation is approved by the EU and enforced by the member states.


However, Greenpeace observes that the cloud phenomenon may aggravate the overall climate change situation because the collective demand for more computing resources will increase dramatically in the next few years. Even the most efficiently built datacenter with the highest utilization rates will only mitigate, rather than eliminate, harmful CO2 emissions until regulatory measures are taken by governments to incite the generation and use of renewable energy sources in cloud computing infrastructures.

Refer my upcoming two posts to explore more on the above and to read in more detail about 
Cloud Computing as a Green IT Strategy
and 
Cloud Computing as an IT Efficiency Strategy

Thursday 13 June 2013

Cloud Computing Relationships with Green IT

Perhaps, I should begin the discussion about cloud computing and green IT relationships with the statement that connection between the two concepts is not well established, and as far as cloud computing goes their is a lot “buzz”, confusion, and soon to come, disillusion. An article by Tom Jowitt in Network World illustrates this situation with the findings of Rackspace's Green Survey of 2009 that discloses skepticism about the green benefits of cloud computing, and demonstrates that cost savings and datacenter consolidation are current issues driving the green IT agenda (Jowitt 2009).

It is reported in the survey that cost savings are proving to be the biggest driver of decisions made about environmentally responsible ICT managers and that companies are still concerned with green initiatives, and are continuing on the track to sustaining and improving their environmentally friendly policies (Rackspace 2009). But in the 2009 edition of the Green Survey, Rackspace added a new question about whether its customers view cloud computing as a greener alternative to traditional computing infrastructure. The result was rather deceptive indicating that only 21% percent agreed that cloud computing was a much greener alternative, against 35% percent that were not convinced on its green benefits, as illustrated in the illustrations below.

Rackspace customer views on cloud computing as a greener alternative to traditional computing infrastructures? (Graphic courtesy of Rackspace)

Rackspace customer views on how cloud computing fit into their environmental initiatives (Graphic courtesy of Rackspace)

Furthermore, only 7% of Rackspace's customers think that cloud computing is critical to their company to become greener, and 46% say that cloud computing is not part of their overall environmental strategy.

Instead, the survey indicates that they are relying on more traditional green initiatives. Seventy one percent have undertaken or are focusing on recycling; 31% on datacenter consolidation; 29% on transportation (car pooling and travel restrictions); 10% on renewable energy; 10% on carbon footprint; and 2% on LEED certification (Rackspace 2009).

In order to better understand why a sizable proportion of Rackspace's customers have the perception that cloud computing is not critical for their company to become greener and that it is not part of their environmental strategy, we need to enter with some detail into what is intended behind the concept of green IT.

Green IT is an umbrella term that Forrester defines in (Washburn & Mines 2009, p.3) as “IT suppliers and their customers reducing the harmful environmental impacts of computing.” Forrester claims in (Washburn & Mines 2009, pp.3-4) that achieving green IT objectives in an organization implies five types of core activities.

Energy efficiency and management: Energy efficiency and management involves reducing energy expenses resulting from the sprawling of server and storage farms, intensive datacenter cooling, and distributed IT assets such as PCs and printers. It is, for most organizations, the major driving force towards Green IT. Energy efficiency can be improved by provisioning more efficient brands of PCs, monitors, power supplies, servers and cooling equipment. Environmental-aware procurement organizations are leaning towards equipment that complies with energy-efficient standards like Energy Star and Electronic, while Product Environment Assessment Tool (EPEAT) helps make the right purchasing decisions. Energy management, in turn, leans towards energy conservation by powering-down IT assets when not in use or by using renewable sources of energy.

Equipment and resource reduction: Reducing the IT equipment footprint by decommissioning and consolidating underutilized equipment reduces energy consumption and proactively curb electronic waste, or e-waste.

Life-cycle and e-waste management: To limit the fast growing proliferation of hazardous materials such as cadmium, lead, and mercury, CIOs should buy less ICT equipment, use less, lengthen the life cycles of ICT assets, and ensure the responsible reuse, recycling, and disposal of IT assets at their end of life.

Support for green corporate initiatives:The new corporate initiatives trend is prompting CIOs to better understand how to provide ICT infrastructure, applications, and expertise to improve the sustainability of business processes and operations outside of ICT, such as support for telecommuting and teleconferencing, paperless billing, building automation, and enterprise wide carbon and energy management.

Governance and reporting: Governance and reporting is viewed as an important process element to green IT, including setting goals, documenting policies, capturing best practices, and reporting progress.

Sunday 26 May 2013

Cloud Computing Benefits


The main economic appeal of cloud computing relies on its usage-based pricing model, often described as “converting capital expenses (CAPEX) to operating expenses (OPEX). Usage-based pricing is different from renting in that renting a resource involves paying a negotiated fee to have the resource available over a period of time, whether or not the resource is actually used. Usagebased pricing or pay-as-you-go pricing involves metering usage and charging fees based on a finegrained usage basis, independently of the time period over which the usage occurs. With Amazon EC2 for example, it is possible to buy computing resources by the hour and storage by the GB. In addition, hours purchased can be consumed non-uniformly in that 100 server-hours purchased can be used fully on the same day of purchase, the day after or at some later time.
Given the economics of cloud computing and the new business models emerging around the delivery of cloud-based services, new applications can be created and delivered at a radically lower cost compared to conventional approaches. As such, industry analysts and ICT practitioners have agreed upon several major benefit forces that should drive the adoption of the cloud.

Collaboration and Community Computing Benefits
As the globalization trend continues, distributed work has become an everyday reality in large organizations. Many existing on-premises applications were originally designed to support employees in same-time, same-place working styles. By contrast, cloud-based productivity tools (for example, Google Apps, Microsoft Office Live Workspace, Intuit’s QuickBase, Facebook) are inherently collaborative and accessible anywhere, including from home. Community computing and collaboration in the cloud brings benefits that are not easily attainable with local computing, such as the detection of distributed denial of service attacks (DdoS) or spams, as cloud platforms that have a wide visibility on the Internet traffic would detect the onset of an attack more quickly and accurately than any local threat detector.

Cloud Computing Costs
Doing like-for-like comparisons between cloud computing and in-house datacenters to run an enterprise business application is a difficult task because it is easy to neglect many of the indirect and hidden costs incurred by operating a datacenter. In fact, there are many arguments and counterarguments surrounding the total cost of ownership (TCO) of hosting in-house compared with using cloud-based services. This is because each organization has its own capital and operational cost structures and its own break-even point, but IDC in argue that most companies, with relatively standard ICT and Web deployments, will achieve lower TCO by using a managed hosting service than by hosting in self-owned and managed facilities. However, a simple comparison of costs for self-owned versus hosted facilities is typically not possible, even for small companies, due to the large number of indirect and hidden costs20 affecting in-house operations that are overlooked.
In support of this statement, IDC argues that “too many companies inappropriately compare the headline costs of in-house operations and managed services when they evaluate the two side-by-sides, such as the capital cost of servers versus monthly recurring fees. The range of costs necessary to run a decent-quality hosting operation in-house is wider than many companies appreciate, and in house cost cutting can be illusory, creating more in risk than it saves in cost.”
To help out with this issue, Amazon developed in the “Economics of the AWS cloud vs. Owned ICT Infrastructure” a comparative analysis of several direct and indirect costs entailed by owning the facility versus using the AWS cloud that will be used hereafter. In this Section, I will strive to sum up all the direct and indirect costs that apply to operating a self-owned datacenter and how they compare to using cloud-based managed services. This outline will be used hereafter as a calculation basis for TCO of the reference use case.
Operating a self-owned datacenter incurs a number of tangible asset's capital or lease costs and other landlord fees, as well as personnel costs that broadly divide into three categories:
·        Datacenter facility costs that include: building maintenance and upkeep, fit-out costs, technical space maintenance and refurbishment, two or more fiber ducts and fiber services to the building, power plant, backup power generators, fuel storage, chillers, physical security systems (access control, CCTV, security presence, etc.), fire suppression, racks, cabling, and so on. To be included are business continuity redundancy for most of these components, and insurance for all of them.
·        Computing equipments costs that include: depreciation, planned life-cycle replacement, unplanned replacement, backup/hot swap, spare parts inventory (onsite or with supplier), power and cooling costs, software licenses, system monitoring, system security (IDS, email security, DDoS mitigation, etc.)
·        Personnel costs that include: salaries and related overheads of facilities and security staff to operate the physical datacenter as well as of ICT staff to manage the technical environment; cover for staff absence; attrition costs; training; staff facilities.
This is only a subset of the costs a company necessarily incurs in operating its own hosting operations. While many companies, depending on the scale of their operations, make do without some of these components, they are typically incurring risk in return for the cost saving (for example, by cutting back on redundancy, or not deploying a DDoS capability, or under-resourcing the operation in staff terms). A company that uses a managed hosting service will still pay these costs, but the maim assumption about cloud computing's cost-saving opportunities discussed so far is that these costs are shared across all customers of the service provider and, through the economies of scale the hosting provider can achieve, the customer will pay only a fraction of the amount for the in-house operations equivalent.

Conclusion
I think that for most organizations, outsourcing to the cloud should reduce risks and hence costs. The cost elements outlined above all present risk as well as cost to an organization in terms of service disruption resulting in lost orders. Many companies operate internal ICT SLAs, but in the face of a major disruption affecting operations, internal SLAs are effectively worthless. An SLA from an external service provider would typically not cover the cost of lost business or customer dissatisfaction, but can go some way to mitigating the financial impact. More significantly, if stringent enough, the disruption should act as a major incentive for the service provider to fix problems quickly and well. A strong SLA does not nullify risks, but will reduce the financial impact by ensuring quick problem resolution and a level of loss buffering.
Cloud computing is still at an early stage. Therefore, to a significant extent, its technological and business models are as yet unproven. Cloud computing is not necessarily for everyone, nor for any type of application. It is probable though that data security and privacy compliance concerns will prevent a rapid adoption of public cloud solutions in heavy regulated industries, and in many global companies that operate in multiple jurisdictions as stated by Gartner in (Logan 2009). That is why, a company considering moving applications to the cloud must be conscious of their security policies and regulation compliance constraints. Beyond that, I believe that issues around data security and privacy risks in the cloud have been overly emphasized. It is also the opinion of several field experts, who talked at the Kuppinger Cole37 and Cloud Slam 2010 virtual conferences on cloud computing and security I had the opportunity to attend.
There seems to be a consensus around the idea that cloud computing is not inherently insecure or even less secure than traditional ICT. The cloud way may even be more secure than many poorly managed information systems where traditional ICT is incapable of providing the same level of expertise and control on their production systems for reasons as diverse as insufficient staff, limited budget for training and hiring top-of-the-line security experts. As a matter of fact, internal ICT teams can hardly compete with the budget and level of expertise carried out by the big cloud computing vendors to effectively secure their infrastructure from a physical and logical standpoint. In addition, it should be well understood that companies are always responsible —irrespective of whether their data resides in the cloud or not—vis a vis their legal obligations. Therefore, what a company needs to determine is whether or not it can protect, produce and consume sensitive data in the cloud with the same level of security and regulatory compliance as it does internally. Companies wishing to use cloud-based services should ascertain that their provider can meet their requirements and, if so, at what costs if any. Meeting security and compliance requirements can be onerous and expensive for both parties. Litigious relations are often a direct result of not properly addressing the responsibilities of all parties in the contract. Therefore, any hosting business relationships should clearly state what jurisdiction applies to the hosting contract. Cloud hosting providers should honor the security and compliance requirements of their customers, and provide transparent answers to inquiries around those questions.
It should be clearly stated that the responsibility to deal lawfully with corporate data, whether it be in the cloud or not, is not the responsibility of the cloud provider. It is always the responsibility of the company to protect the data it produces, no matter where data is located. In other words, the processes used to deal with the legal complexity of managing data should not be different in the cloud than in a self-owned datacenter. A company must know what it is doing in the cloud by first creating its security and regulatory compliance processes internally, and then ensuring that they can be carried equally by the provider or themselves to the cloud.
Finally, CEOs and CIOs need to understand that cloud computing requires new policies and new controls because it may give rise to new ICT risks that can have an operational and even strategic impact on the enterprise's efficiency and effectiveness. Adopting cloud computing to externalize computing resources poses the question of ascertaining opportunities versus operational and strategic risks.

Wednesday 22 May 2013

Discussing Private & Public Cloud


In my last post I explained in detail that what are Public and Private cloud. My team have some more points in mind discussed below.

Cloud computing is a style of computing whereby, to qualify as a cloud, the offered services should adhere to a combination of attributes, not just one. Stronger examples of cloud services will adhere to more attributes of cloud computing than will weaker ones. Consumers and providers of cloud services must examine these attributes to determine whether the services will deliver the expected outcome.
The greatest differences between private and public clouds reside in the level of support available, upfront costs, the extent to which the infrastructure can be isolated and secured from external threats, and the ability to customize the service delivery and comply with regulatory compliance as we will see in greater detail below. As public clouds are built atop of shared and virtualized infrastructures, there are generally more limited customization possibilities than with private clouds. Private clouds built atop of dedicated servers, storage and networks can more easily meet the enterprise's security policies, governance and best practice requirements.

But private cloud computing should be viewed as a continuous evolution trend towards a rationalization of the datacenter and improved operational efficiency rather than as a discontinuous innovation. This trend is not new in ICT as it already started with the consolidation of the business applications through virtualization. Thus, private cloud computing is pushing these rationalization and efficiency objectives one step further by enabling a service-based delivery approach for the firm's ICT resources and charge consumers (i.e. business units) on a per usage basis. However, private cloud computing does not bring several of the key business benefits of cloud computing itself, namely the elimination of an upfront commitment, the transformation of capital expenses into operational expenses, and the availability of an unlimited amount of ICT resources at a snap of a finger. Small and medium businesses that do not have a critical mass of compute, storage and network resources to share in a pool, as well as the human capital and expertise to build and maintain a cloud-based service delivery model, will not be able to get the expected economies of scale and operational efficiency promises of cloud computing.
As a general rule of thumb, it is wise to avoid endless discussions about what is and what is not cloud computing and focus on examining how much a given provider can deliver of the value proposition of cloud computing through the support of its fundamental characteristics.

Tuesday 21 May 2013

Cloud Computing Deployment Models

A common misperception about cloud computing is that, eventually, there will be only a handful of cloud platforms, all of which public. This is highly unlikely given the complex ICT needs in large organizations, according to the consulting firm Accenture in its special report “What the Enterprise Needs to Know about cloud computing”
While some general-purpose public clouds will exist, two other types of cloud are likely to emerge. One type, community or speciality clouds will cater to the particular needs of a select group of organizations, an industry or even a country. The healthcare industry is a good example since the inherent nature of medical records underscores the need for clouds to be  non-public so as to ensure data security, while levering mutualized infrastructures to lower ICT costs. Likewise, some large multinationals may opt to build and operate their own private clouds or internal clouds while continuing to tap into external cloud sources. In this way, they can achieve both elasticity and control over service quality, security, data ownership and integrity, and other important regulatory issues. Furthermore, there are applications that simply don't run well in a pure multi-tenant environment. Databases, for example, perform better on dedicated hardware where they don't have to compete for server input/output (I/O) resources. Plus, some businesses prefer to run databases on dedicated hardware for PCI compliance16 reasons or because they do not want sensitive data to reside on a shared platform, even if the environment is highly secure.
Other applications, such as web servers, run well in the cloud because they can use the elasticity of the cloud to scale rapidly. For example, GoGrid Hybrid Hosting gives businesses the option and flexibility of building a secure, high-performance scalable server network for hosting web applications, using a combination of cloud and dedicated server hosting interconnected via a private network link.
Overall, the NIST as well as other practitioners and academics agree to identify four common cloud computing deployment models.

Public Cloud
In simple terms, public cloud services are characterized as being available to clients from a third-party service provider via the Internet. The term “public” does not always mean free, even though it can be free or fairly inexpensive to use. A public cloud does not mean that a user's data is publicly visible; public cloud vendors like Amazon typically provide an access control mechanism for their users. Public clouds provide an elastic, cost-effective means to deploy solutions.

Private Cloud
Private cloud computing—sometimes called Enterprise or Internal cloud computing—is a style of computing where scalable and elastic ICT-enabled capabilities are delivered as a service to internal customers using Internet technologies. This definition is very similar to the definition of public cloud. Hence, the distinction between private cloud and public cloud relates to who can access or use the services in question and who owns or coordinates the resources used to deliver the services (Daryl C. Plummer et al. 2009, p.5). In other words, a private cloud is a cloud that implements the cloud computing model in a private facility where only a single organization has access to the resources that are used to implement the cloud. Therefore, it is a cloud that an organization implements using its own physical resources such as machines, networks, storage, and overall data center infrastructure (Wolsky 2010). A private cloud intends to offer many of the benefits of a public cloud computing environment, such as being elastic and service-based, but differs from a public cloud in that in a private cloud-based service environment, data and processes are managed within the organization for an exclusive set of consumers without the restrictions of network bandwidth, security exposures and legal requirements that public cloud services may entail. In addition, private cloud services are supposed to offer providers and users greater control over the infrastructure, improve security and service resilience because its access is restricted to designated parties. Nonetheless, a private cloud is not necessarily managed and hosted by the organization that uses it as it can be managed by a third party and be physically located off premises, built atop of a public cloud infrastructure or built as a hybrid cloud. In principle, a private cloud assumes a dedicated hardware environment of pooled hardware resources with a virtualization layer running on top of it, allowing an enterprise to create and manage multiple virtual servers within a set of physical servers and charge the organization's business units per usage. According to Gartner in (Bittman 2009, p2), it is envisioned that private  clouds may prevail in the first phases of the cloud computing era whereby many large companies will offload their ICT operations from running their own data and enterprise applications to secure offsite clouds linked to the company's offices through virtual private   networks (VPN) over the Internet. There is some amount of controversy whether a private cloud should be considered as a genuine cloud-based computing environment. For instance, (Armbrust et al. 2009) argues that except for extremely large infrastructures of hundreds of thousands of machines, such as those operated by Google or Microsoft, private clouds exhibit only a subset of the potential benefits and characteristics of public clouds.
There are inherent limitations to consider with private clouds when it comes to elasticity and scaling because the number of virtual machines that can be provisioned is limited by the physical hardware infrastructure. An enterprise can of course add more machines to expand the infrastructure compute power, but this cannot be done as fast and seamlessly as with public clouds. Thus, (Armbrust et al. 2009, p.13) argues not to appoint private clouds as full-fledge cloud computing platforms as this would lead to exaggerated claims. However, they acknowledge that private clouds could get most of the cloud-based computing benefits when interconnected with public clouds through a hybrid cloud-based computing model. The Table below summarizes the key differences between public clouds and private clouds.

Community Cloud
A community cloud is controlled and used by a group of organizations that have shared interests, such as specific security requirements or a common mission. The members of the community share access to the data and applications in the cloud.

Hybrid Cloud
A hybrid cloud is a combination of a public and private cloud that interoperate. In this model, users typically outsource nonbusiness-critical information and processing to the public cloud, while keeping business-critical services and data in their control. The embodiment of hybrid clouds is sometimes found in what is called a Virtual Private Cloud (VPC) whereby a portion of a public cloud is isolated to be dedicated for use by a single entity or group of related entities such as multiple departments within a company. In its simplest form, access to VPC services will be limited to a single consumer and will deliver a service consumption experience that is virtually identical to the public cloud services. VPC services are an emerging phenomenon driven by consumers that are interested in the potential of cloud computing, but who do not want to concede too much control, or share their computing environment with other customers. When combined with a hybrid cloud computing model (for example, using internal resources and external cloud computing services) (Wood et al. n.d.), VPC services have the potential to bridge the gap between public and private cloud models. By providing additional control, management and security beyond that of public cloud services, the VPC approach reduces risks and makes it feasible to deploy a wider range of enterprise applications.
Cloud bursting is a technique used along with hybrid clouds to provide additional resources to private clouds on an as-needed basis. If the private cloud has the processing power to handle its workloads, the hybrid cloud will not be used. When workloads exceed the private cloud’s capacity, the hybrid cloud will automatically allocate additional resources to the private cloud.
All three main cloud providers examined for this study (GoGrid, Amazon and Rackspace) provide some form of hybrid cloud computing services.

Saturday 18 May 2013

Fundamental Characteristics of Cloud Computing

According to the NIST (Mell & Grance 2009), there are six fundamental characteristics that make cloud computing different from other more conventional ICT outsourcing facilities. Many academics and industry analysts including (Armbrust et al. 2009), (Vaquero et al. 2008) (McKinsey&Company 2009) and (Daryl C. Plummer et al. 2009), have refined these characteristics and argue that to be considered a cloud service, a solution should adhere to some combination of these attributes.

On-demand self-service: a customer can unilaterally provision computing resources as needed, such as server time and network storage, without requiring any human intervention in the process and by using just a credit card. It is not even necessary to be registered as a legal commercial entity. Plummer et al. in “Five Refining Attributes of Public and private cloud Computing” further contend the importance of the service-based characteristics of cloud computing through the claim that consumer concerns should be abstracted from provider concerns in that the interfaces should hide the implementation details and enable a completely automated and ready-to-use response to the consumer of the service. The articulation of the service feature is based on service levels and ICT concerns such as availability, response time, performance versus price, and clear and predefined operational processes, rather than technology and its capabilities. In other words, “what the service needs to do is more important than how the technologies are used to implement the solution”

Pay-per-use: customers pay services on a pay-per-use (or pay-as-you-go) basis. This characteristic addresses the utility dimension of cloud computing, but utility computing must not be confused with cloud computing as a whole because it relates rather to the pay-per-use business model of the cloud rather than being a substitutive term. Furthermore, the pay-peruse transferability does not make the difference between buying compute power out of one virtual machine for 1000 hours and buying compute power out of 1000 virtual machines for one hour. The cost is the same. Altogether, the pay-per-use characteristic implies that enterprises incur no infrastructure capital costs, just operational thanks to the pay-per-use business model with no contractual obligations.

Scalable and elastic: elasticity is a key characteristic of cloud computing, which resides in the ability to quickly increase or decrease ICT resources (i.e. compute power, storage and network capacity on demand). Elasticity15 is a trait of shared pools of resources. It allows provisioning any number of server instances and disk storage programmatically in an economical way, that is, on a fine-grained basis with a lead time of minutes rather than weeks. To a customer, capabilities available for provisioning ICT resources often appear to be unlimited and can be purchased in any quantity at any time which allows matching resource allocation with the current workload. Scalability is a feature of the underlying infrastructure and software platforms. For example, a scalable service called Auto Scaling in AWS removes the need for customers to plan far ahead of time for more capacity. For example, AWS Auto Scaling allows to automatically scale Amazon EC2 capacity up or down according to the conditions set by the customer. Amazon claims that with Auto Scaling, customers can ensure that the number of Amazon EC2 instances they are using scales up seamlessly during demand spikes to maintain performance, and scales down automatically during demand lulls to minimize costs. Complementary to the concept of elasticity are the concepts of over-drafting or cloud-bursting. Cloud-bursting pushes elasticity even further by allowing to automatically get more capacity from external clouds when the primary cloud infrastructure is overloaded. The concept of over-drafting also supports the idea of a cloud Market Exchange whereby providers could trade compute capacity at a spot price when supply and demand fluctuate over time. Cloud-bursting, in turn, allows to migrate application workloads out of the in-house datacenter to the cloud when the local infrastructure is overloaded.

Shared resource pooling: ICT resources are pooled to serve multiple consumers in a multitenant (shared) fashion to leverage economies of scale and achieve maximum efficiency. Amrbrust et al. (2009) coined the term statistical multiplexing to describe how providers bet on the likelihood that, at any point in time, the demand never exceed the capacity of the infrastructure by shuffling virtual resources across datacenters. This scheme works on the principle that the more applications and the more users using the cloud, the more dispersed are the load profiles, whereby applications running at peak load can “borrow” resources from idle applications or applications running in a quiet state. To be efficient, resource pooling requires a high degree of virtualization, automated operations and state-of-the-art resource provisioning capabilities. There is a sense of location independence and transparency in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (for example, country, state or datacenter).

Uses Internet technologies: cloud services are delivered using Internet technologies including Universal Resource Identifiers (URI), data formats and protocols, such as XML, HTTP, and representational state transfer (Restful) Web services. In other words, cloud computing is Internet-centric, allowing ubiquitous access over the network through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, laptops and PDAs). In that, it offers a computing software model that is multi- platform, multi-network and global. 

Metered by use: cloud services are tracked with usage metrics to enable multiple payment models. The service provider has a usage accounting model for measuring the use of the services, which could then be used to create different pricing plans and models. These may include pay-as-you go plans, subscriptions, fixed plans and even free plans. The implied payment plans will be based on usage, not on the cost of the equipment. These plans are based on the amount of service used by the consumers, which may be in terms of hours, data transfers or other use-based attributes delivered providing transparency for both the provider and the consumer.