tag:blogger.com,1999:blog-91418285007834221042024-03-13T13:21:25.798-07:00zephyrI will use this space to share my thoughts on cloud computing and distributed computing technologies, if you are interested in sharing my views please do subscribe to thisAravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.comBlogger21125tag:blogger.com,1999:blog-9141828500783422104.post-73750747428965476022016-12-19T04:45:00.001-08:002016-12-19T04:45:24.355-08:00Feature engineering tips for improving predictions..Visit my blog posted on wordpress <a href="https://asimovweb.wordpress.com/2016/12/19/feature-engineering-tips-for-improving-predictions/">https://asimovweb.wordpress.com/2016/12/19/feature-engineering-tips-for-improving-predictions/</a>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-10002629617414748532016-10-06T08:14:00.003-07:002016-10-06T08:14:57.863-07:00Choose your best platform for machine learningMy new blog entry on choosing best platform for machine learning based solution...
<a href="https://asimovweb.wordpress.com/2016/10/06/choose-your-best-platform-for-machine-learning-solution/">https://asimovweb.wordpress.com/2016/10/06/choose-your-best-platform-for-machine-learning-solution/</a>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-90774022618644624792016-10-06T08:13:00.002-07:002016-10-06T08:15:28.648-07:00Moving to wordpress blog I have moved my blog to wordpress , Please continue to follow me at <a href="https://asimovweb.wordpress.com/">https://asimovweb.wordpress.com/</a>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-27210980390984182472016-05-01T13:28:00.001-07:002016-05-01T13:46:42.255-07:00Fast forward transformation process in data science with Apache Spark <B>Data Curation :</B>
<p>Curation is a critical process in data science that helps to prepare data for feature extraction to run with machine learning algorithms. Curation generally involves extracting, organising, integrating data from different sources. Curation may be a difficult and time consuming process depending on the complexity and volume of the data involved.</p>
<p>Most of the time data won't be readily available for feature extraction process, data may be hidden is unobstructed and complex data sources and has to undergo multiple transformational process before feature extraction . <p>
<p>Also when the volume of data is huge this will be a huge time consuming process and can be a bottle neck for the whole machine learning pipeline.</p>
<B>General Tools used in Data Science : </B>
<li><a href="https://www.r-project.org/about.html">R Language</a> - Widely adopted in data science with lot of supporting libraries</li>
<li><a href="http://www.mathworks.com/">Mat lab</a> - Commercial tool with lot of builtin libraries for data science </li>
<li><a href="http://spark.apache.org/mllib/">Apache Spark</a> - New, powerful and gaining traction, Spark on Hadoop provides distributed and Resilient architecture help to fasten the curation process by multiple times.</li>
<B>Recent Study</B>
<p>One of my project involved curing and extracting the features from huge volume of data in natural language conversation text. We started with using R programming language for the transformation process, R language is simple with lot of functionalities in statistics and data science space but has limitations in terms of computation and memory and in turn efficiency and speed. We tried to migrate the transformation process to Apache Spark and observed tremendous improvement in the performance of transformation, We were able to bring down the time for transformation from more day to almost an hour of time for huge volume of data. </p>
<p><B>Here are some of the benefits that I would like to highlight the benefits of Apache Spark over R.</B></p>
<li>Effective Utilization of resources:</li>
<p>By default R runs in a single core and is limited by the capabilities of the single core and memory usage. Even though you have multi core system R is limited with using only one core, for memory it has the process limitations of a 32 bit R execution with virtual memory user space of 3 GB and for 64 bit R execution limited to amount of RAM. R has some parallel lib packages that can help to span the processing to multi cores.</p>
<p>Spark can run in distributed form with the processing running on executors with each executor running on its own process utilizing the cpu and memory.Spark brings the concept of <a href="http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds">RDD (Resilient Distributed Dataset)</a> to achieve distributed , resilient and scalable processing solution.</p>
<li>Optimized transformation:</li>
<p>Spark has the concept of <a href="http://spark.apache.org/docs/latest/programming-guide.html#transformations">Transformation and Actions </a>where the transformation perform lazy evaluation of job execution until an Action task is being called and intern brings optimization when multiple transformations are involved before an Action task which leads to transferring the results back to the driver program</p>
<li>Integration to Hadoop Eco System</li>
<p>Spark integrates well in the <a href="http://spark.apache.org/docs/latest/running-on-yarn.html">Hadoop ecosystem with yarn architecture</a> and can easily bind to HDFS , multiple NOSQL database like HBase, Cassandra etc. </p>
<li>Support for multiple languages:</li>
<p>Spark API's has support on multiple programming languages like Scala, Java and Python </p>
Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-21327358005389308892016-01-03T17:46:00.001-08:002016-01-03T17:47:41.827-08:00Google Auto Awesome Video - Is it a machine learning solution ???<p>Wish you all a very happy and a wonderful new year 2016 !!!</p>
<p>Happy to start the year with a blog covering some aspects on machine learning and this post is actually an inspiration from the new year eve celebration.</p>
<p>In the new year eve celebration with friends and I captured some photo moments using google photos app. The next day morning when I woke up I got a notification in mobile , would you like to review and save the video made out of photos in the new year eve event with some nice background music added, in google terms they call as <a href="http://www.androidcentral.com/creating-auto-awesome-videos-new-google-photos-app">Auto Awesome videos</a> in google photos app. </p>
<p>I am happy to see the video that has been made automatically and ready to share , there is also a manual mode where we can customize photos for the video. But my interest is on the automatic creation and started thinking how this design could have been ?</p>
<p> At first cut I was able to sense this could potentially be a machine learning implementation and with my limited data science knowledge I thought to do provide some guess work on how this could have been designed while running at a large scale for millions of tenants at the server end </p>
<p>Let us understand the requirement in detail , given a collection of images we have to perform the following</p>
<li> Categorize the images into groups and pick the group corresponding to a specific event say the new year celebration in this case.</li>
<li> To improve accuracy , check and eliminate any irrelevant images that went into the group by error </li>
<li> Judge the mood of the event and add appropriate background music </li>
<p>Now let us analyse the type of machine learning solutions that could have potentially been used for this design</p>
<li>First part of the problem is towards categorizing the images into groups based on some parameters , <a href="https://en.wikipedia.org/wiki/K-means_clustering">Clustering algorithm</a> could be a best fit to perform this. Given a specific dataset clustering helps to categorize the dataset into different partitions based on features of data, In our scenario grouping could be based on time of photo taken but I have seen cases where grouping is done based on image background and persons involved.</li>
<li>Next is to eliminate outliers in the grouping ,some photos might have accidentally went in to the group. Algorithms like <a href="https://en.wikipedia.org/wiki/Anomaly_detection">anomaly detection</a> can be executed to eliminate those outlier images in the collection.</li>
<li>Final step is to understand the mood of the images and add relevant background music to the video, <a href="http://www.technologyreview.com/view/533061/neural-network-rates-images-for-happiness-levels/">sentiment analysis </a>algorithm on pictures could potentially help to understand the mood of the images.</li>
<p>Disclaimer : This is purely my own guess work of the design and google might have done in a different way :) </p>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-41664620488228700592015-12-07T17:07:00.002-08:002015-12-07T17:08:42.657-08:00NOSQL with RDBMS fallback:
<p><a href="http://nosql-database.org/">NOSQL </a>adoption becoming prominent across different critical applications to reap the benefits of performance, fault tolerance, high availability for bigger volume database needs. While migrating to NOSQL one of the risk that architects feel is what if the application gets into some unseen issues and take more time to fix , as NOSQL adoption is not battle tested across different domain and sectors and how to design some fallback strategy. </p>
<p>Few factors that people may think while migrating to NOSQL</p>
<li>What will happen if we get into unexpected errors in production and if takes more time to fix? </li>
<li>What if the product vendors itself haven’t faced such scenarios? </li>
<li>What if I have reporting or other dependent system that are well integrated with RDBMS and difficult to migrate from RDBMS in the current phase? </li>
<p>Architects would like to design some fallback option as RDBMS where application can switch to RDBMS on unrecoverable NOSQL issues. This raises few questions in mind on how to design the same. </p>
<li>How do I sync up data both in NOSQL and RDBMS for a high data volume without losing the order of update? </li>
<li>How do I sync up without adding much overhead to application? Synchronous update to both NOSQL and RDBMS will be too much of overhead... </li>
<li>How to reliably update the data between the two systems without any loss? </li>
<li>What if RDBMS goes down and how can I design to sync up reliably even on failures? </li>
<p>I can think of design depicted below to address the same. </p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiox8J8tNcMTPiAgmh1isNfDHgo2cxPEIjprtQYU08uCZaLomFkZ9ccPtUTimrr-7EYx5wXRcxbOMYswBbHft6IM1yvau2yWJSOFOMZo5RzMraoI_XwyIJL-0C5gOiYfE6F3m0aBGXDnyx3/s1600/NOSQL+with+RDBMS+fallback.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiox8J8tNcMTPiAgmh1isNfDHgo2cxPEIjprtQYU08uCZaLomFkZ9ccPtUTimrr-7EYx5wXRcxbOMYswBbHft6IM1yvau2yWJSOFOMZo5RzMraoI_XwyIJL-0C5gOiYfE6F3m0aBGXDnyx3/s640/NOSQL+with+RDBMS+fallback.png" /></a></div>
<p>Few components involved in the design are <a href="http://kafka.apache.org/">Apache Kafka</a> receiving the updates and <a href="http://storm.apache.org/index.html">Apache Storm </a>process the data to update the same to RDBMS. Both of these system are designed to work for big data needs in a reliable and distributed form. </p>
<p><a href="http://kafka.apache.org/">Apache Kafka</a> is a high performance message queuing system. Application pose the messages (Insert / Update / Delete) to Kafka message queue. To improve the performance with parallel processing the queue can be partitioned by table / region / logical data design as per the NOSQL model. </p>
<p><a href="http://storm.apache.org/index.html">Apache Storm</a> is a real time processing engine that can consume message through Spout component, do some processing through bolts and update the data to RDBMS. Storm has topologies to process <a href="http://storm.apache.org/documentation/Guaranteeing-message-processing.html">guaranteed data processing</a>, <a href="http://storm.apache.org/documentation/Transactional-topologies.html">transactional mode of commitments</a> that makes it suitable to handle partial failures and during commitments. </p>
<p><b>Benefits: </b></p>
<li>Information can be updated in an asynchronous way to RDBMS reliably. </li>
<li>Apache Kafka providing high available, reliable, partitioned queuing system fits best to handle huge data volume. </li>
<li>Storm doing real time processing for Kafka messages on the partitioned queue and provides a reliable way to update RDBMS</li>
<li>On RDBMS failures Kafka will persist the messages and Storm can continue to sync up the messages when it is comes back. </li>
Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-54320936537597969422015-12-02T16:18:00.001-08:002015-12-02T16:51:27.009-08:00Next Generation Enterprise Application Architecture<p>New generation applications are architectured not only with the goal of desiging the application functionally and performing stable but also focuses on different aspects that are becoming critical </p>
<p><b>Scalability</b> – Elastic scalability for all the layers of the application including data tier</p>
<p><b>Fault Tolerance</b> - Ability to handle failure smartly and avoid cascading failures from and to dependent systems</p>
<p><b>High Availability</b> – Ability to have application highly available on all layers including database even on data center failures</p>
<p><b>Efficient utilization of Infrastructure</b> - Ability to scale up and down on demand </p>
<p><b>Faster access</b> to underlying data on high load and data volumes</p>
<p>Ability to handle <b>different data formats</b> efficiently</p>
<p> Few reasons that are tied to this evolution are the need and benefits towards cloud adoption ( could be either private or public cloud ) and the need to handle huge data volume with faster response on the data tiers </p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq8eRmeqWC2LTr-mgn8bd7Shz9S5AqWp7iMTWtWiXn7n70xaSpVz9-DQ-1QMo8MHqMMtHX7ilSwFz_8KPE9dAX6eX2Tyjakld2d8XSnCFAZo1V2n2H3wf6FvOo_18dGGRLJz4jTP1E7eSO/s1600/nextgenarch.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgq8eRmeqWC2LTr-mgn8bd7Shz9S5AqWp7iMTWtWiXn7n70xaSpVz9-DQ-1QMo8MHqMMtHX7ilSwFz_8KPE9dAX6eX2Tyjakld2d8XSnCFAZo1V2n2H3wf6FvOo_18dGGRLJz4jTP1E7eSO/s640/nextgenarch.png" /></a></div>
<table border="1" style="width:100%">
<col width="100">
<col width="175">
<col width="175">
<tr>
<th></th>
<th>Benefits</th>
<th>Solutions</th>
</tr>
<tr>
<td>Physical -> IaaS -> PaaS</td>
<td><p>Elastic Scalability </p>
<p>High Availability</p>
<p>Efficient Infrastructure utilization</p>
<p>Zero downtime deployment</p>
</td>
<td><p><a href="https://www.vmware.com/">VMWare </a>, <a href="https://www.openstack.org/">Open Stack</a> – Private Cloud IaaS</p>
<p><a href="https://aws.amazon.com/">AWS</a>, <a href="https://azure.microsoft.com/">Azure </a>– Public Cloud IaaS , PaaS</p>
<p><a href="https://www.cloudfoundry.org/">Cloud Foundry</a> – PaaS on private and public cloud</p>
</td>
<tr>
<td>Circuit Breaker</td>
<td><p>Fault Tolerance </p>
<p>Better failure handling</p>
<p>Avoid avalanche failures</p>
</td>
<td><p><a href="https://github.com/Netflix/Hystrix/tree/master/hystrix-dashboard">Netflix Hystrix</a></p>
<p><a href="https://github.com/App-vNext/Polly">Polly</a></p>
</td>
</tr>
<tr>
<td>Service Registry</td>
<td><p>Registry for dynamic instance scaling </p>
</td>
<td><p><a href="https://github.com/Netflix/eureka">Netflix Eureka</a></p>
<p><a href="https://zookeeper.apache.org/">Apache Zookeeper</a></p>
</td>
</tr>
</tr>
<tr>
<td>Intelligent Load balancing</td>
<td><p>Intelligent Load Balancing utilizing the elastic scaling and self-discovery </p>
</td>
<td><p><a href="https://github.com/Netflix/ribbon">Netflix Ribbon </a></p>
<p><a href="https://f5.com/glossary/load-balancer/">F5</a></p>
<p><a href="https://www.nginx.com/">Nginix</a></p>
</td>
</tr>
</tr>
<tr>
<td>Search</td>
<td><p>Quick search needs from huge data sets, full text search, pattern matching </p>
</td>
<td><p><a href="https://www.elastic.co/">Elastic Search</a> </p>
<p><a href="http://lucene.apache.org/solr/">Solr</a></p>
</td>
</tr>
<tr>
<td>Data Grid</td>
<td><p>Faster read write data, Reduce read / write overhead to database, high availability to data </p>
</td>
<td><p><a href="http://www.oracle.com/technetwork/middleware/coherence/overview/index.html">Coherance </a></p>
<p><a href="http://pivotal.io/big-data/pivotal-gemfire">Gemfire</a></p>
<p><a href="http://www.couchbase.com/">Membase</a></p>
</td>
</tr>
<tr>
<td>Queue</td>
<td><p>Reliable data transfer across different data layers </p>
</td>
<td><p><a href="http://kafka.apache.org/">Kafka </a></p>
<p><a href="https://www.rabbitmq.com/">RabbitMQ</a></p>
<p><a href="https://docs.oracle.com/javaee/6/tutorial/doc/bncdq.html">JMS</a></p>
</td>
</tr>
<tr>
<td>NoSQL</td>
<td>
<p>Big data – database needs</p>
<p>Heavy Read / Write on high data volumes</p>
<p>Faster response needs on the data</p>
<p>High Availability on data</p>
<p>Fault Tolerance on data</p>
<p>Distributed database</p>
<p>Scalable database</p>
</td>
<td><p><a href="http://www.couchbase.com">Couchbase </a></p>
<p><a href="https://www.mongodb.com">MongoDB</a></p>
<p><a href="https://hbase.apache.org/">HBase</a></p>
<p><a href="https://cassandra.apache.org/">Cassandra</a></p>
<p>Graph DB ( <a href="https://github.com/thinkaurelius/titan">Titan</a>, <a href="http://orientdb.com/docs/last/index.html">OrientDB </a>)</p>
</td>
</tr>
<tr>
<td>Hadoop</td>
<td>
<p>Distributed file processing and storage ecosystem</p>
<p>High speed batch (MapReduce) / real time ( <a href="http://storm.apache.org/">Storm</a>, <a href="http://spark.apache.org/">Spark </a>) processing</p>
</td>
<td><p>Different Hadoop distributions like <a href="http://hortonworks.com/">Hortonworks</a>, <a href="http://www.cloudera.com/">CloudEra</a>,<a href="https://www.mapr.com/">MapR </a></p>
</td>
</tr>
</table>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-91511711803709621092015-08-30T14:40:00.001-07:002015-08-31T12:40:36.082-07:00Kafka messaging system<b>Apache Kafka :</b>
<p><a href="http://kafka.apache.org/">Kafka </a>is an open source message queuing solution under Apache project, Kafka is new when compared to existing queue solutions like RabbitMQ, ActiveMQ, AWS SQS on product maturity but is quickly gaining momentum due to its features. In this post we will analyze some features of Kafka to see why it is gaining attention in the market.</p>
<p>The demand for processing huge data sets is growing everyday across enterprise systems and data is being processed in batch or real time and the queuing systems play an important role in connecting the data from source system / producer to destination / consumers. With huge dataset in transit enterprise are looking for message solution that can provide high throughput per second , scale horizontally, provides high availability and integrate well with other solutions.</p>
<b>Scalability:</b>
<p>This is one of feature where Kafka gets edge over other solutions, the ability to scale horizontally, Kafka achieves it by means of partitioning. We can set the number of partition while defining a topic (queue) and these partitions will get distributed across the broker nodes in the cluster and hence when we want to scale the system we can add more broker nodes and hence the partitions get realigned across the added broker nodes.</p>
<b>Fault Tolerance and High Availability:</b>
<p>Kafka achieves high availability by means of replication the partitions get replicated across different broker nodes and Kafka uses Zookeeper for its co-ordination. When a broker node goes down zookeeper co-ordinates so that the data is continued to be served from the replicated broker node partition and hence high availability for data is achieved.</p>
<b>Unit of Order:</b>
<p>Kafka guarantees unit of order delivery at each partition level and messages posted across different partitions are not guaranteed to be in order.<p>
<b>Reliability & Guaranteed delivery:</b>
<p>Kafka provides reliability to the message delivery and has options of synchronous and asynchronous acknowledgements for the message delivery.</p>
<b>Integration with Big Data solutions:</b>
<p>Kafka comes as part of Hadoop distributions and integrates with Hadoop map reduce for bulk consumption in parallel, for real time stream processing needs Kafka has good integration with systems like <a href="https://github.com/apache/storm/tree/master/external/storm-kafka">Apache Storm</a> and <a href="https://spark.apache.org/docs/1.2.0/streaming-kafka-integration.html">Spark</a>.</p>
Reference : <a href="http://kafka.apache.org/">Kafka</a>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-89715304201481806902015-08-15T20:11:00.002-07:002015-08-16T04:54:00.096-07:00Build your own monitoring solution for couch base<p>Recently i was trying to build a monitoring solution for couch base , i followed a simple approach that worked out well, thought to share the same in this post. </p>
<ol><b>Requirements for the solution</b></ol>
<li>Simple solution that can collect metrics from the http stats endpoint of couch base</li>
<li>Script based solution that can customized by operation team</li>
<li>Visualization dashboards</li>
<li>No additional software installation on couch base servers</li>
<ol><b>Solution Needs</b></ol>
<li>An light weight app server that can collects metrics on regular interval </li>
<li>A persistence layer that stores the data </li>
<li>A visualization tool that can bind well with the persistent data</li>
<p><b>Solution Architecture</b></p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnYczYzW-gcZqkpHoAcUxjExKFqLBr-k1P_igKuDQGz5V3JXDQ59j8MrCa3b-vmvm0MoGTieA9DoIo4D_AHQigkIvtUXRBy4Y1Zdnq1MqXQFBiA_Lx6iAtVwTUJF7phk6HKc4l-AaZEcke/s1600/nodejsbasedmonitoring.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnYczYzW-gcZqkpHoAcUxjExKFqLBr-k1P_igKuDQGz5V3JXDQ59j8MrCa3b-vmvm0MoGTieA9DoIo4D_AHQigkIvtUXRBy4Y1Zdnq1MqXQFBiA_Lx6iAtVwTUJF7phk6HKc4l-AaZEcke/s640/nodejsbasedmonitoring.png" /></a></div>
<p><b>Solution Highlights</b></p>
<li>Solution collects json stats data which has thousands of metrics as a whole and stores it to elastic search</li>
<li>NodeJS is a light weight server based on java script </li>
<li>Couch base stat endpoint exposes JSON based metrics and elastic search works well with storing JSON data</li>
<li>Kibana provides nice visualization for elastic search through different charts</li>
<li>NodeJS provides built libraries to elastic search </li>
<li>Hosting NodeJS, elastic search, Kibana are very simple, you can setup easily all of these components in few minutes through dockers </li>
<li>Elastic search is highly scalable</li>
<li>The approach can be applied for any monitoring where metrics are exposed through json format</li>
Please find the reference of the solution under
<a href="https://github.com/Aravindakum/CouchbaseMontoring-NodeJS-Kibana">github</a>
Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-53014511275328196332015-01-23T23:11:00.004-08:002015-01-26T23:45:57.456-08:00Two Phase Commit<p><b>Bottlenecks in database layer</b></p>
<p>Database has been seen as a most common place of bottleneck for performance across different tiers of the application. Few possible reasons restricting RDBMS performance</p>
<li>RDBMS not able to scale horizontally</li>
<li>Locking at row level / data page level / table level during database transactions </li>
<p><b>NoSQL on the rescue</b></p>
<p>I have mentioned about NoSQL data stores in my <a href="http://breezeoncloud.blogspot.in/2015/01/nosql-introduction.html">previous blog</a> which achieves horizontally scalability distribution . In this blog I would like cover how transactional behaviour is achieved with high performance in NoSQL </p>
<p><b>Transactions in RDBMS</b></p>
<p>Let us try to understand how transactions operates in RDBMS,transactions with ACID (Atomicity, Consistency, Isolation and Durability) compliance executes all the actions involved in the transaction in a single step, if all the actions succeeds it commits the changes otherwise all the changes are revoked. To achieve this locking happens across the tables and hence performance becomes bottleneck</p>
<p>Let us take a simple transaction in order placement and analyse , An simple order management transaction involves two tables involving order and billing.</p>
<li>Confirm the product for the order by decrementing a count in the product catalogue </li>
<li>Confirm billing for payment</li>
<p>If payment succeeds, transaction as a whole has to be committed . If payment fails for some reason , change in product catalogue has to be revoked to original state so that it is available for others to consume. RDBMS achieves this whole process as a single step by locking these tables until transaction is completed and hence gets the ability to commit or revoke at the end of transaction, but this gives a overhead in performance as these tables gets locked and any read/ write on those are kept on hold unless stale read is enabled</p>
<p><b>Restrictions with NoSQL </b></p>
<p>Let us understand restrictions in NoSQL towards achieving this type of transactions </p>
<li>NoSQL provides locking at row level and not across rows, tables etc</li>
<li>With adoption of polyghot persistence and distributed transactions we may need to perform a transaction across different datastores as well</li>
<p><b>Two Phase Commit</b></p>
<p>Two Phase Commit is an approach followed in NoSQL to achieve transaction like behaviour. As the name mentions transactions happens in two phases with the ability to commit or revoke the changes made in phase 1 during phase 2. The approach introduces a additional component transaction manager which helps to commit or roll back the changes made in each phase of the transaction</p>
<p><b>Advantages with Two Phase Commit approach</b></p>
<li>Provides high performance with transactions</li>
<li>Ability to retry for failure portions of the transactions ( interesting ) </li>
<li>Provides distributed transaction like capabilities across data stores</li>
<p><b>My personal experience with Two Phase Commit</b></p>
<p>Recently I personally came to see a Two Phase Commit scenario handled in amazon for my order placement which became inspiration for this post.</p>
<p>I placed an order ( a laptop desk ) in Amazon where my order placement was received and I went for sleep.Looks the payment got failed for some reason. Next day morning I got a notification to retry my payment, in this case Amazon instead of revoking the order placed ,Amazon holded the order for additional time say for a day or two and provided option to retry payment failure. </p>
<p><b>My order confirmation</b></p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRBHNw-dgCtX4Nuz0LDRTh8D0_7WyKAVRemeOsxRLKKcejfiwesnMwq6SQV3QfeyZqiDF1U_FdWYaYzEAhDSjOyEJK-L3zu_4PnvQZdQnG70JVXqJRV0DbpHQlO09k6cXxDJ2eeWAHbn3M/s1600/phase1-order.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiRBHNw-dgCtX4Nuz0LDRTh8D0_7WyKAVRemeOsxRLKKcejfiwesnMwq6SQV3QfeyZqiDF1U_FdWYaYzEAhDSjOyEJK-L3zu_4PnvQZdQnG70JVXqJRV0DbpHQlO09k6cXxDJ2eeWAHbn3M/s320/phase1-order.png" /></a></div>
<p><b>Payment retry for my order</b></p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbn5YVNsYKU09gqNHb62fmNZRAma5a5CWSxtLeuzozhnkIu56wIHjypDMnVaYUq_y4ZY0A2stGcjHVFIPI5UV17q2stBESiQuTt7SMGEgxAkE2Ru33r5d7Vix7pC8RfALijFQKLZxDlbGf/s1600/phase2-order.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgbn5YVNsYKU09gqNHb62fmNZRAma5a5CWSxtLeuzozhnkIu56wIHjypDMnVaYUq_y4ZY0A2stGcjHVFIPI5UV17q2stBESiQuTt7SMGEgxAkE2Ru33r5d7Vix7pC8RfALijFQKLZxDlbGf/s320/phase2-order.png" /></a></div>
<p>I believe Amazon has implemented some form of two phase commit to achieve this, I was personally happy with the way amazon handled my payment failure as the order was not revoked and i was given retry option later to complete the order with my laptop desk was still reserved for me.</p>
<p>This also opens the door for other mode of payments like cash on delivery etc. </p>
<p>Few links on Two Phase Commit </p>
<li><a href="http://www.enterpriseintegrationpatterns.com/docs/IEEE_Software_Design_2PC.pdf">Star Bucks Approach for Performance</a></li>
<li><a href="http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/">MongoDB - how to perform 2PC</a></li>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-78118652619498674242015-01-03T12:29:00.002-08:002015-01-05T12:20:28.156-08:00NoSQL - An Introduction<b>NoSQL:</b>
<p>
Not Only SQL often mentioned as NoSQL provides a mechanism to store and retrieve data not through tabular format as in relational databases.
</p><p>There are different NoSQL solutions that are matured and being adopted widely Ex : Redis,Riak,HBase,Cassandra,Couchbase,MongoDB.</p><p> It is critical to understand the concepts of NoSQL why and how NoSQL has been used for a specific application architecture because every NoSQL solution is unique in its own way and different from general RDBMS solutions.
</p>
<b>Need for NoSQL:</b>
<p>
With the explosion of web and social interactions the volume and complexity of data has grown tremendously huge, it is the need of the hour for each applications to scale seamlessly without any compromise in performance. </p><p>If we look at RDBMS performance starts degrading at some point of data volume and complexity and applications has to think adopting various NoSQL solutions to match the growth of huge volume and complexity.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXMAw9YDnPT7iaybK0wghPZ6_d0VGTFSAKQPh9SiwDZ_K8CyWTanosC6Lxofvw5IEu-trhlFcG7Dig6JVt-QVUU9VFnKRm9csYDX4uzz3gzkOmFTgNBeZcxgeAf4cPi-5zQUkdhYNWPWrH/s1600/nosql+-sql.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXMAw9YDnPT7iaybK0wghPZ6_d0VGTFSAKQPh9SiwDZ_K8CyWTanosC6Lxofvw5IEu-trhlFcG7Dig6JVt-QVUU9VFnKRm9csYDX4uzz3gzkOmFTgNBeZcxgeAf4cPi-5zQUkdhYNWPWrH/s320/nosql+-sql.png" /></a></div>
<b>Polyglot persistence: </b>
<p>
NoSQL Solutions has become more matured and enterprise data architects has started implementing NoSQL in their solutions giving a strong message that RDBMS is not the only solution to data needs. </p><p>Problem in data persistence are unique and each problem needs specific solution to handle the scenario better. The concept of Polyglot persistence evolved to insist that application needs to use specific persistence solution to handle specific scenarios. </p><p>The table below helps to describe some scenarios in a retail web application and how different persistence solution can help to satisfy those needs.
</p>
<p>
<table>
<tr>
<th>Scenario</th>
<th></th>
<th>Persistence solution</th>
</tr>
<tr>
<td>User Sessions</td>
<td></td>
<td>Re-dis</td>
</tr>
<tr>
<td>Financial data</td>
<td></td>
<td>RDBMS</td>
</tr>
<tr>
<td>Shopping Cart</td>
<td></td>
<td>Riak</td>
</tr>
<tr>
<td>Recommendations</td>
<td></td>
<td>Neo4j</td>
</tr>
<tr>
<td>Product Catalog</td>
<td></td>
<td>MongoDB</td>
</tr>
<tr>
<td>Analytics</td>
<td></td>
<td>Cassandra</td>
</tr>
<tr>
<td>User Activity Logs</td>
<td></td>
<td>Cassandra</td>
</tr>
</table></p><p>
<span class="c20">[Source: http://martinfowler.com/bliki/PolyglotPersistence.html]</span></p>
<b>Coming out of relational mindset:</b>
<p>
One of the biggest problem with the adoption of NoSQL solution is to keep the people out of relational mindset. The minds of data modeling is deeply rooted with RDBMS and relational concepts. </p><p>It will be difficult initially to conceptualize data out of relational world, but if we understand these concepts and look back at our data solutions made, many of them may not need the normalized modeling.</pp?
The following are few things that we will see when we go towards NoSQL world
<ul>
<li> Data is not normalized.</li>
<li> Data will be duplicated.</li>
<li> Tables will be schema less and doesn’t follow a predefined pattern</li>
<li> Data can be stored in different formats like JSON, XML, audio, video etc.</li>
<li> Database may have some compromise on some attributes on ACID properties</li>
<li> Data may have some compromise on attributes like consistency.</li>
</ul>
</p>
<b>CAP theorem:</b>
<p>
CAP theorem defines set of basic attributes for any distributed system. Understanding the dimensions of CAP theorem helps to understand any NoSQL solution better.
The below diagram describes the attributes satisfied by different distributed database system on multiple server deployment environment. </p><p>
The important point to note here is that none of the distributed system can completely satisfy all the three dimensions of CAP theorem Consistency, Availability and Partition Tolerance. </p><p>Any distributed system can a maximum 2 dimensions of CAP completely, depending on the application requirement people have to choose for the specific distributed system that suits their needs.
</p><p>
It is critically important to understand the application requirements and understand where the specific NoSQL solution falls.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKSMZigWxj9SQfDHij6lx7mirYVsf1sFOfWFs_JqhjeqS-yh5J2qnTCfNWNiHGG1tKHGC0OSUCiDLFL-45B5hc6fVBgsZi9JTh1ljvRj4gkTIsiGkV8uCYYqKnvedPmWWT6yuMdSYRs35/s1600/CAP.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgVKSMZigWxj9SQfDHij6lx7mirYVsf1sFOfWFs_JqhjeqS-yh5J2qnTCfNWNiHGG1tKHGC0OSUCiDLFL-45B5hc6fVBgsZi9JTh1ljvRj4gkTIsiGkV8uCYYqKnvedPmWWT6yuMdSYRs35/s320/CAP.png" /></a></div>
<b>ACID Compliance:</b>
<p>
ACID stands for <b>A</b>tomicity, <b>C</b>onsistency, <b>I</b>solation, <b>D</b>urability, these are set of properties that guarantee transactional behavior in RDBMS operations.</p><p> RDBMS concepts that focuses more on integrity, concurrency, consistency and data validity, but many of the data needs in software applications may not be interested in these aggregation, integrity and validity or can handled in upper layers.</p><p> Compromising any of these in database architecture may bring high performance and scalability that RDBMS is currently lagging.
</p><p>
NoSQL database for example is not strictly ACID compliance where it can compromise on one of the attributes of ACID to achieve extreme scalability and performance.</p><p>
It is critically important to understand the application requirements and understand the specific NoSQL used and how the compromise is made.</p>
<b>BASE versus ACID:</b>
<p>
NoSQL instead of adhering ACID compliance it tends to be BASE compliance in order to achieve scalability and high performance. The following are defined to be BASE attributes that NoSQL solution are trying to adopt</p>
<ul>
<li>
<l><b>B</b>asic <b>A</b>vailability</l></li>
<li><l><b>S</b>oft-state </l></li>
<li><l><b>E</b>ventual consistency</l></li>
</ul>
<p>
<b>NoSQL Categorization based on data modeling </b></p>
<ul>
<li> Key Value Stores Ex : Redis, Riak, Amazon Simple DB</li>
<li> Column Family Stores ( Big Tables ) Ex : Cassandra , HBase</li>
<li> Document databases Ex : CouchDB , Couchbase , MongoDB</li>
<li> Graph databases Ex : Neo4j, Titan</li>
</ul>
<p>
Each of this NoSQL provide unique advantage on specific functionalities, selection of a specific NoSQL category is critical for the design of the application needs. </p><p>At high level the specific NoSQL solution can be chosen based on the complexity and querying associated with the data model.</p>
<p>
The below diagram provides a good comparison on the different NoSQL databases.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwiVFNpcTaac_PZA8GPZ9cQwZXzqpzzeNeFmqqxL9CfNCevhDTspdkTK30R3ya4V1e6bBVdMle0LT1YKEdOf1o9JdD7SugViUaiBg2gwwJIe_LUTzXNgAFe_9UuEr3cggZYyzOySdF2T6P/s1600/NOSQL+Data+Model.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiwiVFNpcTaac_PZA8GPZ9cQwZXzqpzzeNeFmqqxL9CfNCevhDTspdkTK30R3ya4V1e6bBVdMle0LT1YKEdOf1o9JdD7SugViUaiBg2gwwJIe_LUTzXNgAFe_9UuEr3cggZYyzOySdF2T6P/s320/NOSQL+Data+Model.png" /></a></div>
<span class="c20">[Source: https://highlyscalable.wordpress.com/2012/03/01/NoSQL-data-modeling-techniques/]</span></p>
<b>NoSQL based on system architecture:</b>
<p>
Based on the system architecture, NoSQL can be categorized into the following.</p>
<ul>
<li> P2P ( Ring Topology ) </li>
<li> Master Slave </li>
</ul>
<p>
Each architecture has some pros and cons and a decision has to be made based on the needs.
</p>
<p>
<table>
<tr>
<th></th>
<th></th>
<th>P2P ( Ring Topology )</th>
<th></th>
<th>Master Slave</th>
</tr>
<tr>
<th>Role</th>
<th></th>
<td>All Nodes carries equal role</td>
<th></th>
<td>Master – Slave architecture with specific responsibilities on specific nodes</td>
</tr>
<tr>
<th>Consistency</th>
<th></th>
<td>Eventual</td>
<th></th>
<td>Strong</td>
</tr>
<tr>
<th>Write/Read</th>
<th></th>
<td>Read and Write happens through all the nodes</td>
<th></th>
<td>Mostly write is driven through restricted nodes</td>
</tr>
<tr>
<th>Availability</th>
<th></th>
<td>High Availability</td>
<th></th>
<td>Availability is little compensated when master / Write node fails</td>
</tr>
<tr>
<th>Data</th>
<th></th>
<td>Data is partitioned across all nodes with replication</td>
<th></th>
<td>Data is partitioned into multiple slave nodes with replication</td>
</tr>
<tr>
<th>Examples</th>
<th></th>
<td>Cassandra, Couch base</td>
<th></th>
<td>HBase, MongoDB</td>
</tr>
</table></p>
<b>Data read / writes:</b>
<p>The need of NoSQL type of solutions arrives when you tend to operate with huge volume of data and high requirements for performance towards read and writes.</p>
<p>Below are the typical use cases where NoSQL databases will be used</p>
<ul>
<li>Scalable databases</li>
<li>High availability and fault tolerance</li>
<li>Ever growing set of data</li>
<li>Bulk read / write operations </li>
</ul>
<p>
Some NoSQL will be good for write intensive workloads and some are good for read intensive workloads and some are good for mixed workloads, specific analysis has to be done to decide on the NoSQL solution based on the needs.</p>
<p>Other important concepts that I would like to highlight specific to any NoSQL solutions:</p>
<b>Shrading:</b>
<p>Shrading is one of the important concept in NoSQL solution by which the data is partitioned horizontally across different nodes in the cluster. This means the data is split based on some logic say some a hash code and spread across different nodes.</p>
<b>Replication:</b>
<p>The data is not only partitioned by different nodes but also replicated across different cluster nodes. The replication factor will be a configuration in the solution. Replication ability gives high availability and automatic fail over when a specific node goes down.</p>
<p class="c4"></p><p class="c9"><span class="c3"><b>Reference:</b></span></p><p class="c9"><span class="c28 c1">
<a class="c2" href="http://www.google.com/url?q=http%3A%2F%2Fmartinfowler.com%2F&sa=D&sntz=1&usg=AFQjCNHROry1Bmz_6TABvMoKIkfgjInLNQ">http://martinfowler.com/</a></span></p><p class="c9 c16"><span class="c1 c28">
<a class="c2" href="http://www.google.com/url?q=http%3A%2F%2Fhighscalability.com&sa=D&sntz=1&usg=AFQjCNE5fqFk-nViuqHMmahaEmHHyb3THw">http://highscalability.com</a></span></p><p class="c9 c16"><span class="c28 c1">
<a class="c2" href="http://www.google.com/url?q=http%3A%2F%2Fnosqlguide.com%2F&sa=D&sntz=1&usg=AFQjCNFqqFSnEw6MiD0wnFS730Mtw6284g">http://nosqlguide.com/</a></span>
Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-65098617084385468342013-05-02T20:19:00.000-07:002013-05-03T07:26:40.440-07:00Configuring Apache Hadoop Cluster in a standalone machine<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=ProgId content=Word.Document>
<meta name=Generator content="Microsoft Word 14">
<meta name=Originator content="Microsoft Word 14">
<link rel=File-List href="HadoopIntroduction_files/filelist.xml">
<!--[if gte mso 9]><xml>
<o:DocumentProperties>
<o:Author>CIS</o:Author>
<o:LastAuthor>CIS</o:LastAuthor>
<o:Revision>2</o:Revision>
<o:TotalTime>165</o:TotalTime>
<o:Created>2013-05-03T02:51:00Z</o:Created>
<o:LastSaved>2013-05-03T02:51:00Z</o:LastSaved>
<o:Pages>2</o:Pages>
<o:Words>1095</o:Words>
<o:Characters>6245</o:Characters>
<o:Company>Comcast</o:Company>
<o:Lines>52</o:Lines>
<o:Paragraphs>14</o:Paragraphs>
<o:CharactersWithSpaces>7326</o:CharactersWithSpaces>
<o:Version>14.00</o:Version>
</o:DocumentProperties>
<o:OfficeDocumentSettings>
<o:AllowPNG/>
</o:OfficeDocumentSettings>
</xml><![endif]-->
<link rel=themeData href="HadoopIntroduction_files/themedata.thmx">
<link rel=colorSchemeMapping
href="HadoopIntroduction_files/colorschememapping.xml">
<!--[if gte mso 9]><xml>
<w:WordDocument>
<w:SpellingState>Clean</w:SpellingState>
<w:GrammarState>Clean</w:GrammarState>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:EnableOpenTypeKerning/>
<w:DontFlipMirrorIndents/>
<w:OverrideTableStyleHps/>
</w:Compatibility>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;
mso-font-charset:2;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:0 268435456 0 0 -2147483648 0;}
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;
mso-font-charset:2;
mso-generic-font-family:auto;
mso-font-pitch:variable;
mso-font-signature:0 268435456 0 0 -2147483648 0;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;
mso-font-charset:0;
mso-generic-font-family:swiss;
mso-font-pitch:variable;
mso-font-signature:-536870145 1073786111 1 0 415 0;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-parent:"";
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
mso-themecolor:hyperlink;
text-decoration:underline;
text-underline:single;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-noshow:yes;
mso-style-priority:99;
color:purple;
mso-themecolor:followedhyperlink;
text-decoration:underline;
text-underline:single;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
mso-style-unhide:no;
mso-style-qformat:yes;
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst
{mso-style-priority:34;
mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle
{mso-style-priority:34;
mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast
{mso-style-priority:34;
mso-style-unhide:no;
mso-style-qformat:yes;
mso-style-type:export-only;
margin-top:0in;
margin-right:0in;
margin-bottom:10.0pt;
margin-left:.5in;
mso-add-space:auto;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
span.SpellE
{mso-style-name:"";
mso-spl-e:yes;}
span.GramE
{mso-style-name:"";
mso-gram-e:yes;}
.MsoChpDefault
{mso-style-type:export-only;
mso-default-props:yes;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;
mso-fareast-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
.MsoPapDefault
{mso-style-type:export-only;
margin-bottom:10.0pt;
line-height:115%;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;
mso-header-margin:.5in;
mso-footer-margin:.5in;
mso-paper-source:0;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:73671106;
mso-list-type:hybrid;
mso-list-template-ids:-1263355800 -419936922 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.25in;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:1.75in;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.25in;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:2.75in;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:3.25in;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:3.75in;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:4.25in;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:4.75in;
text-indent:-9.0pt;}
@list l1
{mso-list-id:93792557;
mso-list-type:hybrid;
mso-list-template-ids:-1363742102 67698703 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l1:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l1:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l2
{mso-list-id:736785779;
mso-list-type:hybrid;
mso-list-template-ids:610018190 503486840 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l2:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:38.25pt;
text-indent:-.25in;}
@list l2:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:74.25pt;
text-indent:-.25in;}
@list l2:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:110.25pt;
text-indent:-9.0pt;}
@list l2:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:146.25pt;
text-indent:-.25in;}
@list l2:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:182.25pt;
text-indent:-.25in;}
@list l2:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:218.25pt;
text-indent:-9.0pt;}
@list l2:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:254.25pt;
text-indent:-.25in;}
@list l2:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:290.25pt;
text-indent:-.25in;}
@list l2:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:326.25pt;
text-indent:-9.0pt;}
@list l3
{mso-list-id:1134367605;
mso-list-type:hybrid;
mso-list-template-ids:-1850603560 197682394 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l3:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:38.25pt;
text-indent:-.25in;}
@list l3:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:74.25pt;
text-indent:-.25in;}
@list l3:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:110.25pt;
text-indent:-9.0pt;}
@list l3:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:146.25pt;
text-indent:-.25in;}
@list l3:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:182.25pt;
text-indent:-.25in;}
@list l3:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:218.25pt;
text-indent:-9.0pt;}
@list l3:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:254.25pt;
text-indent:-.25in;}
@list l3:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:290.25pt;
text-indent:-.25in;}
@list l3:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
margin-left:326.25pt;
text-indent:-9.0pt;}
@list l4
{mso-list-id:1184593562;
mso-list-type:hybrid;
mso-list-template-ids:-1676251714 67698697 67698691 67698693 67698689 67698691 67698693 67698689 67698691 67698693;}
@list l4:level1
{mso-level-number-format:bullet;
mso-level-text:\F076;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l4:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l4:level3
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l4:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l4:level5
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l4:level6
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
@list l4:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Symbol;}
@list l4:level8
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:"Courier New";}
@list l4:level9
{mso-level-number-format:bullet;
mso-level-text:\F0A7;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;
font-family:Wingdings;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
-->
</style>
<!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026"/>
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1"/>
</o:shapelayout></xml><![endif]-->
</head>
<body lang=EN-US link=blue vlink=purple style='tab-interval:.5in'>
<div class=WordSection1>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Introduction<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>In this
post I have tried to explain how to setup and configure Apache <span
class=SpellE>hadoop</span> cluster with 2 or more nodes in a standalone
machine probably ur windows laptop or desktop. This will help you to build map reduce program and run in a
real cluster like environment and will help you to understand <span
class=SpellE>hadoop</span> better.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Apache <span
class=SpellE>hadoop</span> is a free open source software release for reliable
and scalable distributed computing. It is a framework that allows for
distributed processing large data sets across clusters of computers.</p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>During this hadoop cluster setup, at high level the
following activities will be performed<o:p></o:p></b></p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Creating base nodes for the cluster</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Setting up base operating system for the cluster</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Setup <span class=SpellE>hadoop</span>
dependencies in the nodes</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Configure <span class=SpellE>hadoop</span> users
,access</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Setup authenticity across the cluster nodes</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Configure <span class=SpellE>hadoop</span> roles
for the nodes</p>
<p class=MsoListParagraphCxSpMiddle style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Run <span class=SpellE>hadoop</span> daemons for
each roles</p>
<p class=MsoListParagraphCxSpLast style='text-indent:-.25in;mso-list:l4 level1 lfo1'><![if !supportLists]><span
style='font-family:Wingdings;mso-fareast-font-family:Wingdings;mso-bidi-font-family:
Wingdings'><span style='mso-list:Ignore'>v<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Browse for <span class=SpellE>hadoop</span> <span
class=SpellE>hdfs</span> and job tracker sites </p>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Creating base nodes
for the cluster:<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>If you are planning
to try out this setup on your local windows laptop or desktop, download <a href="http://www.vmware.com/products/player">VMware
player</a> which is a free tool that can help you with setting up virtual
machines with their local IP, so at the end you have a simple network of
servers that can talk to each other. Nowadays <span class=GramE>laptop are</span>
coming with multiple cores and 4 GB of Memory, so it is easy to setup at least
3 nodes in your personal laptop or desktop. </p>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Setup a Linux flavor
of OS in the base nodes:<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>On the base VM nodes
you have set with VMware player, you can install a <span class=SpellE>linux</span>
based OS with a ISO file, I choose <span class=SpellE>ubuntu</span> server as
the OS, it is available free to <a href="http://www.ubuntu.com/download/server">download</a>
. <a href="http://www.ubuntu.com/download/server">Download</a>
the ISO and complete the VM creation with the VM Player.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Once the OS
installation is done, you will be ended with a root or <span class=SpellE>sudo</span>
user for the server. You can get the IP address of the servers by typing the
command <span class=SpellE><span class=GramE>ifconfig</span></span><span
class=GramE> ,</span> note down the IP addresses for the servers.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span></p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Setup <span
class=SpellE>Hadoop</span> and its dependencies:<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>We have the servers
setup with OS and a <span class=SpellE>sudo</span> user to operate <span
class=SpellE>on<span class=GramE>,now</span></span> we can start setting up <span
class=SpellE>hadoop</span> in the nodes. </p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Apache <span
class=SpellE>hadoop</span> has the following dependencies</p>
<p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Java version 6 or higher </p>
<p class=MsoListParagraphCxSpLast style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>SSH</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span><a href="http://www.oracle.com/technetwork/java/javase/downloads/index.html">Download </a>and set up in the server, I setup up JRE under a folder
/opt/jre1.6.0_45 and set Java Home under ~/.<span class=SpellE>bashrc</span> ,
you can verify the setup by typing the command Java -version and check the
version details displayed.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>SSH can be
installed by using the command - <span class=SpellE><i style='mso-bidi-font-style:
normal'>sudo</i></span><i style='mso-bidi-font-style:normal'> apt-get install <span
class=SpellE>openssh</span>-server</i></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Verify SSH by
executing the command <i style='mso-bidi-font-style:normal'>SSH <span
class=SpellE>localhost</span></i> to that machine itself.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Download a stable
version of
<span class=SpellE><a href="http://hadoop.apache.org/releases.html">hadoop</a></span> <span class=GramE>.</span> I choose
1.0.X as the version to setup. </p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>If you have downloaded
the .tar.gz file you can use the command tar -<span class=SpellE>zxvf</span>
{file.tar.gz} to unzip the contents. I have set it to the location /opt/hadoop-<span
class=GramE>1.0.4 .</span></p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Configure <span
class=SpellE>Hadoop</span><o:p></o:p></b></p>
<p class=MsoNormal style='text-indent:.5in'>We have hadoop and its dependencies
set, we can now start configuring <span class=SpellE>hadoop</span> in that server,
this involves the following activities</p>
<p class=MsoListParagraphCxSpFirst style='margin-left:.75in;mso-add-space:auto;
text-indent:-.25in;mso-list:l0 level1 lfo3'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Create a new user , say <span class=SpellE>hadoop</span>,
In Ubuntu I used the command <span class=SpellE>Adduser</span> #user</p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:.75in;mso-add-space:
auto;text-indent:-.25in;mso-list:l0 level1 lfo3'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Add the <span class=SpellE>sudo</span> access to
the user by editing /etc/sudoers file , this can be achieved by the following
commands</p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:1.25in;mso-add-space:
auto;text-indent:-.25in;mso-list:l0 level2 lfo3'><![if !supportLists]><i
style='mso-bidi-font-style:normal'><span style='mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin'><span style='mso-list:Ignore'>a.<span
style='font:7.0pt "Times New Roman"'> </span></span></span></i><![endif]><span
class=SpellE><i style='mso-bidi-font-style:normal'>sudo</i></span><i
style='mso-bidi-font-style:normal'> <span class=SpellE>visudo</span><o:p></o:p></i></p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:1.25in;mso-add-space:
auto'><span class=GramE>add</span> the line in the file <i style='mso-bidi-font-style:
normal'>hadoop ALL=(ALL:ALL) ALL <o:p></o:p></i></p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:.75in;mso-add-space:
auto;text-indent:-.25in;mso-list:l0 level1 lfo3'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Add full permission for this hadoop user to
/opt/hadoop-1.0.4 where we have the hadoop binaries folder installed , this can
be done by the following commands</p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:1.25in;mso-add-space:
auto;text-indent:-.25in;mso-list:l0 level2 lfo3'><![if !supportLists]><i
style='mso-bidi-font-style:normal'><span style='mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin'><span style='mso-list:Ignore'>a.<span
style='font:7.0pt "Times New Roman"'> </span></span></span></i><![endif]><span
class=SpellE><i style='mso-bidi-font-style:normal'>Chown</i></span><i
style='mso-bidi-font-style:normal'> –R <span class=SpellE>hadoop:hadoop</span><o:p></o:p></i></p>
<p class=MsoListParagraphCxSpLast style='margin-left:1.25in;mso-add-space:auto;
text-indent:-.25in;mso-list:l0 level2 lfo3'><![if !supportLists]><i
style='mso-bidi-font-style:normal'><span style='mso-bidi-font-family:Calibri;
mso-bidi-theme-font:minor-latin'><span style='mso-list:Ignore'>b.<span
style='font:7.0pt "Times New Roman"'> </span></span></span></i><![endif]><span
class=SpellE><i style='mso-bidi-font-style:normal'>Chmod</i></span><i
style='mso-bidi-font-style:normal'> –R 777 hadoop-1.0.4<o:p></o:p></i></p>
<p class=MsoNormal>You have to repeat the above steps for all the nodes in the
cluster or simply clone the virtual machines but make sure each virtual machine
has got different IP Address. Consider you have created 3 nodes for this cluster.</p>
<p class=MsoNormal>Now we have 3 nodes created, we have to decide on the roles
of the nodes considering one node to be master node playing roles of <span
class=SpellE>namenode</span> and <span class=SpellE>jobtracker</span> and other
nodes playing <span class=SpellE>datanode</span> and <span class=SpellE>tasktracker</span>,
we can call the nodes as <span class=SpellE>hdpMaster</span>, hdpSlave1,
hdpSlave2.</p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Configuring
authenticated SSH access between master and other nodes<o:p></o:p></b></p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'><span
style='mso-tab-count:1'> </span></b>We need to configure
authenticated SSH access (password less) for <span class=SpellE>hadoop</span>
user from <span class=SpellE>masternode</span> to rest of <span class=SpellE>slavenodes</span>.
Perform the following steps to setup the same. </p>
<p class=MsoNormal style='text-indent:.5in'>$<span class=SpellE>ssh-keygen</span>
-t <span class=SpellE>rsa</span> <i style='mso-bidi-font-style:normal'>(
generates the key file)</i></p>
<p class=MsoNormal style='text-indent:.5in'>Copy the key file to all the slave
machines</p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span>$<span
class=SpellE><span class=GramE>scp</span></span> .<span class=SpellE>ssh</span>/id_rsa.pub
<a href="mailto:hadoop@192.168.8.129:~hadoop/.ssh/authorized_keys">hadoop@192.168.8.129:~hadoop/.ssh/authorized_keys</a>
<i style='mso-bidi-font-style:normal'>(Slave1)</i></p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span>$<span
class=SpellE><span class=GramE>scp</span></span> .<span class=SpellE>ssh</span>/id_rsa.pub
<a href="mailto:hadoop@192.168.8.130:~hadoop/.ssh/authorized_keys%20--Slave2">hadoop@192.168.8.130:~hadoop/.ssh/authorized_keys
<i style='mso-bidi-font-style:normal'><span style='color:windowtext;text-decoration:
none;text-underline:none'><span style='mso-spacerun:yes'> </span>(Slave2</span></i></a>)</p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span>You
should also able to <span class=SpellE><span class=GramE>ssh</span></span>
without password into the same, otherwise you have to do the following to do
the same.</p>
<p class=MsoNormal style='text-indent:.5in'>$ <span class=GramE>cat</span> ~/.<span
class=SpellE>ssh</span>/id_dsa.pub >> ~/.<span class=SpellE>ssh</span>/<span
class=SpellE>authorized_keys</span></p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span>Once
the key is added to <span class=GramE>authorized</span> keys of master,
password less access to machines will be possible.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Verify
whether you are able to connect using <span class=SpellE><span class=GramE>ssh</span></span>
to <span class=SpellE>localhost</span> and all the slaves by using <span
class=SpellE>ssh</span> command </p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span><span
class=SpellE><span class=GramE>ssh</span></span> <span class=SpellE>localhost</span></p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span><span
class=SpellE><span class=GramE>ssh</span></span> slave1IP</p>
<p class=MsoNormal><span style='mso-tab-count:1'> </span><span
class=SpellE><span class=GramE>ssh</span></span> slave2IP</p>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><span class=SpellE><b style='mso-bidi-font-weight:normal'>HostEntry</b></span><b
style='mso-bidi-font-weight:normal'> for the Server:<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>Update the
host file with hostnames at <span class=SpellE>etc</span>/<span class=GramE>hosts
,</span> if you want to call the servers with hostnames</p>
<p class=MsoNormal><b style='mso-bidi-font-weight:normal'>Configure <span
class=SpellE>hadoop</span> roles for master and slaves:<o:p></o:p></b></p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>We have all
set for the <span class=SpellE>hadoop</span> to <span class=GramE>start,</span>
we are at the last step of configuring the roles for the nodes and start the
cluster.</p>
<p class=MsoNormal><span style='mso-spacerun:yes'> </span>In the master
node, perform the following steps</p>
<p class=MsoListParagraphCxSpFirst style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>1.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Go to the <span class=SpellE>HadoopHome</span> \
<span class=SpellE>Conf</span> location </p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>2.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update hadoop-env.sh with JAVA_HOME location to
the Java installation path</p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>3.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update core-site.xml to the following<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1UjKP1OMKIY7a4B_hxs-l-pLPOKmgAwInQiOFI_LBIHRheR_ArIu12d79xgc9blqGkX4STIeC_wPZx0-vt09p71zxNtrp9TURCaWEumE6Dy3kL3YWJfWRAs1HA1wv_k_kdQXd0W5Zz3ea/s1600/core-site.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg1UjKP1OMKIY7a4B_hxs-l-pLPOKmgAwInQiOFI_LBIHRheR_ArIu12d79xgc9blqGkX4STIeC_wPZx0-vt09p71zxNtrp9TURCaWEumE6Dy3kL3YWJfWRAs1HA1wv_k_kdQXd0W5Zz3ea/s320/core-site.png" /></a></p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>4.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update hdfs-site.xml to the following<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnZ6sg4ndMY1dwRb57edLGLFrsECKmnmzKt41qAHjuM1WADSJFE7Lj7HA1OdPYRjfy8pQNAG1DzBmIkugnlWRKbu7Q8ra9XbCWUCGl-fIXvrCPI52jmbXvnmtS1GtPV6VVR8Gyrby_PG5C/s1600/hdfssitexml.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhnZ6sg4ndMY1dwRb57edLGLFrsECKmnmzKt41qAHjuM1WADSJFE7Lj7HA1OdPYRjfy8pQNAG1DzBmIkugnlWRKbu7Q8ra9XbCWUCGl-fIXvrCPI52jmbXvnmtS1GtPV6VVR8Gyrby_PG5C/s320/hdfssitexml.png" /></a></p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>5.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update mapred-site.xml to the following<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJlSpf4hQO15JLHUEbv1EIcw8tPFUGwRB8BQeQ47bzZXAGIKZEju8jjkm10BPKOHmJWtmQ-sd4io9VbOp5rQcbz2HcxKHeCZIwKJbxtxeZmkx_aEoIWfE2heeAhjtyih6umugpsOCyE9Dg/s1600/mapredxml.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJlSpf4hQO15JLHUEbv1EIcw8tPFUGwRB8BQeQ47bzZXAGIKZEju8jjkm10BPKOHmJWtmQ-sd4io9VbOp5rQcbz2HcxKHeCZIwKJbxtxeZmkx_aEoIWfE2heeAhjtyih6umugpsOCyE9Dg/s320/mapredxml.png" /></a></p>
<p class=MsoListParagraphCxSpMiddle style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>6.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update masters file with the <span class=SpellE>masterhostname</span></p>
<p class=MsoListParagraphCxSpLast style='margin-left:38.25pt;mso-add-space:
auto;text-indent:-.25in;mso-list:l2 level1 lfo4'><![if !supportLists]><span
style='mso-bidi-font-family:Calibri;mso-bidi-theme-font:minor-latin'><span
style='mso-list:Ignore'>7.<span style='font:7.0pt "Times New Roman"'>
</span></span></span><![endif]>Update slaves file with all <span class=SpellE>slavehostname</span>.</p>
<p class=MsoNormal style='margin-left:20.25pt'>Repeat step 1 -4 to all the
slave nodes.</p>
<p class=MsoNormal style='margin-left:20.25pt'><span class=SpellE>Hadoop</span>
cluster is now configured for <span class=SpellE>hdfs</span> and <span
class=SpellE>mapreduce</span>. We can start the corresponding daemons on the
cluster</p>
<p class=MsoNormal style='margin-left:20.25pt'>Step <span class=GramE>1 :</span>
go to <span class=SpellE>HadoopHome</span> location</p>
<p class=MsoNormal style='margin-left:20.25pt'>Step 2: Format <span
class=SpellE>namenode</span> by running the command bin/<span class=SpellE>hadoop</span>
<span class=SpellE>namenode</span> –format</p>
<p class=MsoNormal style='margin-left:20.25pt'>Step 3: go to bin folder, Run <span
class=SpellE>namenode</span>, <span class=SpellE>datanode</span> <span
class=GramE>daemons ,</span> Run <span class=SpellE>Jobtracker</span>, <span
class=SpellE>tasktracker</span> daemons</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4uKfxqPQ-S6My0es6VZcqRfhRJf-OPYlzpUOgD4jynhHbdiG28xunsFNrPqYaFMJX-9HbiVH37nxlS89NHVNRIdeV-3H0rDvPmFsbSIncdivDsD3rumXdItvboGLGCL7WkQwWgRF_MOZ5/s1600/starthadoop_sh.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi4uKfxqPQ-S6My0es6VZcqRfhRJf-OPYlzpUOgD4jynhHbdiG28xunsFNrPqYaFMJX-9HbiVH37nxlS89NHVNRIdeV-3H0rDvPmFsbSIncdivDsD3rumXdItvboGLGCL7WkQwWgRF_MOZ5/s320/starthadoop_sh.png" /></a>
<p class=MsoNormal style='margin-left:20.25pt'>Option 1: Run ./start-all.sh in
master node, this will start all the daemons in all the nodes cluster as
configured in <span class=SpellE>masters<span class=GramE>,slaves</span></span>
file</p>
<p class=MsoNormal style='margin-left:20.25pt'>Option 2: Run ./start-dfs.sh in
master node, this will start <span class=SpellE>namenode</span> and <span
class=SpellE><span class=GramE>datanodes</span></span><span class=GramE> ,</span>
Run ./start-mapred.sh , this will start <span class=SpellE>jobtracker</span>
and <span class=SpellE>tasktracker</span> in the nodes.</p>
<p class=MsoNormal style='margin-left:20.25pt'>Option 3<span class=GramE>:Run</span>
the following</p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'>In Master
node</p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'><span
style='mso-spacerun:yes'> </span>./hadoop-daemon.sh start namenode</p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'>./hadoop-daemon.sh
start <span class=SpellE>jobtracker</span> </p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'>In Slaves
node run </p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'>./hadoop-daemon.sh
start <span class=SpellE>datanode</span> </p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'>./hadoop-daemon.sh
start <span class=SpellE>tasktracker</span> </p>
<p class=MsoNormal style='margin-left:20.25pt'>You can check the logs of the
nodes or any errors during initialization under <span class=SpellE>HadoopHome</span>/logs
in each of the nodes.</p>
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP-3dfnN39s4aTTZW_s26UiTUcyAEkxB9fDOuyeK0O_sHUQjwt7uVihCjN7krZnVcaWpobSR0wlTYw0I4MFpysHUje_uvtij8NZ6xVJ5H9Pr7dICU6itbTveN8_TLUE71p9iJhXHSFOloX/s1600/VMNodes.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgP-3dfnN39s4aTTZW_s26UiTUcyAEkxB9fDOuyeK0O_sHUQjwt7uVihCjN7krZnVcaWpobSR0wlTYw0I4MFpysHUje_uvtij8NZ6xVJ5H9Pr7dICU6itbTveN8_TLUE71p9iJhXHSFOloX/s320/VMNodes.png" /></a>
<p class=MsoNormal style='margin-left:20.25pt'>If everything went fine, you
should be able to see the following sites for tracking <span class=SpellE>hdfs</span>
and <span class=SpellE>hadoop</span> jobs</p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'><a
href="http://masternode:50070/dfshealth.jsp">http://masternode:50070/dfshealth.jsp</a>
- to track <span class=SpellE>hdfs</span> and its health<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxrGTqbzv0MIIWekvwPlgSV_aMkNBR2hilbAJ5lOCnp3iadSrOivWF-h72ACXCrnqIas9zZPMExrz04uDcU-x93kVLo-7Obx_ZhEVgXFoykErGFmqKWG_tcbUMDiwNRBljPndHspxZ21cJ/s1600/hdfssite.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgxrGTqbzv0MIIWekvwPlgSV_aMkNBR2hilbAJ5lOCnp3iadSrOivWF-h72ACXCrnqIas9zZPMExrz04uDcU-x93kVLo-7Obx_ZhEVgXFoykErGFmqKWG_tcbUMDiwNRBljPndHspxZ21cJ/s320/hdfssite.png" /></a></p>
<p class=MsoNormal style='margin-left:20.25pt;text-indent:15.75pt'><a
href="http://masternode:50030/jobtracker.jsp">http://masternode:50030/jobtracker.jsp</a>
- to track job running and its status<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi0QoQ0uP5UzoDy-qbvlPHmyuAn20En5IWDrca0mfXG3xx59CxPTpUFu0xFqDNhtoyeqUJ_P7Awv8du_NYeVlIyIXAZXPsnkxudWo3un2y4XIB97sr5qG9kqMPn9a2iegCtbomFwo1Eyi_/s1600/mapredsite.png" imageanchor="1" ><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhi0QoQ0uP5UzoDy-qbvlPHmyuAn20En5IWDrca0mfXG3xx59CxPTpUFu0xFqDNhtoyeqUJ_P7Awv8du_NYeVlIyIXAZXPsnkxudWo3un2y4XIB97sr5qG9kqMPn9a2iegCtbomFwo1Eyi_/s320/mapredsite.png" /></a></p>
<p class=MsoNormal style='margin-left:20.25pt'><o:p> </o:p></p>
<p class=MsoNormal style='margin-left:20.25pt'><span class=GramE>Reference :</span>
<a href="http://hadoop.apache.org/docs/stable/cluster_setup.html">Apache hadoop cluster setup</a></p>
</div>
</body>
</html>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-23620862835002232042011-02-27T09:20:00.000-08:002011-02-27T09:26:18.871-08:00A2A 'Cloud Comparison' - Database as a ServiceThis is part of my series of articles on A2A Cloud Comparison. In my previous articles I was explaining my views of A2A Comparison with <a href="http://breezeoncloud.blogspot.com/2010/10/a2a-cloud-comparison-compute-service.html">Compute </a>and <a href="http://breezeoncloud.blogspot.com/2010/11/a2a-cloud-comparison-storage-services.html">Storage</a>. In this article I will provide my views on Database as a Service with Amazon and Azure. <br /><br /><strong>Introduction</strong><br /> We all know how data is crucial to an application take an example whether it is a banking application or an online music store application, data is very important to the whole system. Say you have recently registered and created a user on a specific site and if the user identity is not found next time when you login to the site think how much hesitation will get and you will think twice before continuing to use the site. Think of what will happen if you lose some data in a critical financial application. Losing the data will incur heavy loss to the system or make the application really obsolete. The reason why I am talking about data criticality is because in this blog I am going to talk about the database as a service offering from the cloud computing providers.<br />When we talk about data most of the applications store their data in a database and managing the database will be a crucial task for the system. Database administration helps to manage the database and assures to keep the database updated and highly available. I want to list some to tasks performed as part of database administration<br />1. Patching the database software up to date<br />2. Taking backups of the database<br />3. Maintaining the backup for the specified retention period<br />4. Point in time recovery<br /><br /><strong>Database as Service</strong> <br />What if all the database administration tasks have been taken care and have ability to scale the capacity with high availability and reliability? Database as a Service is the answer for that. <br /><br /><strong>Amazon and Azure Offerings</strong><br /> Both Amazon and Azure provides offerings in the database as a service space and are differentiated in some ways. Amazon provides its offering as RDS (Relational Database as a Service) while Microsoft Azure provides its offering as SQL Azure.<br /> Amazon operates RDS in Infrastructure as a Service space while Microsoft SQL Azure operates at Platform as a Service space, I will be explaining it in detail below. Following the general cloud pricing model this service will also be charged in a Pay as you use model.<br /><br /><strong>RDS:</strong><br />Amazon offering for Database as a service called <a href="http://aws.amazon.com/rds/">RDS </a>(Relational Database as a Service) provides database service for MYSQL database. Recently Amazon has made an announcement that will extend <a href="http://aws.amazon.com/rds/oracle/?utm_source=OraclePR&utm_medium=RDSLandingPage&utm_campaign=Oracle">RDS for Oracle database</a>, that means you will be able to create an Oracle database with all the setup ready in matter of minutes and you can able to create and delete the instances with hourly chargeback model and with all database administration tasks taken care..Sounds interesting?<br />Every RDS instance in Amazon will get a dedicated virtual server instance, database storages with all the data backup and retention policies configured, this is why I called RDS operating in Infrastructure as a Service space and because of its underlying virtualization model the instance can be migrated to a bigger server configuration if needed. Database servers can also be configured for Read replication or Multi Availability Zone deployment for high availability and Disaster Recovery. <br />Recently I have to validate the performance of Oracle database in a specific use case for a POC, for scenarios like this it will be difficult in non cloud model because Oracle software licenses will be charged for duration of a year at least and the licenses are Processor based or Socket based. It will be difficult to compromise with express edition or a single socket license as we have to validate performance scenario and now with cloud model it is easy to execute, create and use it for the period needed and release it when POC is done , as simple as that.<br /><br /><strong>SQL Azure:</strong><br /> Microsoft offering for Database as a Service called <a href="http://social.technet.microsoft.com/wiki/contents/articles/inside-sql-azure.aspx">SQL Azure </a>provides service for SQL Server database. With SQL Azure we will be able to create databases for 1GB, 5GB up to a maximum of 50GB. We can create a smaller DB during creation and can later alter to a maximum of 50GB with all the database management tasks taken care operating in a pay as you use model.<br /> Microsoft operates SQL Azure in a way bit different from Amazon RDS. Unlike RDS SQL Azure does not spare a dedicated virtual server for databases instead multiple SQL Azure databases will be hosted in a bigger SQL Server instance and will be operating more like a shared multi tenant environment with all the tenant specific security measures taken care, this architecture will be abstracted from the end user as the end user will be able to operate the database in a usual way and the user is assured with high availability and scalability.<br /> One thing that has to take care in SQL Azure is that it can scale to a maximum of 50GB as of now and beyond that we have to plan for horizontal scaling of database in our application architecture. <br /><br /><strong>References:</strong><br />http://social.technet.microsoft.com/wiki/contents/articles/inside-sql-azure.aspx<br />http://aws.amazon.com/rds/Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com2tag:blogger.com,1999:blog-9141828500783422104.post-7679139375512013472010-11-08T00:14:00.000-08:002010-11-08T00:20:25.069-08:00A2A ‘Cloud Comparison’ – Storage ServicesThis is part of my series of article 'A2A Cloud Comparison' ; in my <a href="http://breezeoncloud.blogspot.com/2010/10/a2a-cloud-comparison-compute-service.html">previous article </a> I have compared Amazon and Azure on Computing Services space. In this article I have given my view on Cloud Storage Services in general and the corresponding services by Amazon and Azure Cloud Providers.<br /><strong>Storage in Cloud</strong> <br />One of the important services that are provided by Cloud is the Storage Service. Cloud Storage provides enormous amount of storage space that is accessible over internet with features added on top of it. Also as with other cloud services this comes with Pay as you use model. Let us understand why the storage services in cloud is going to be important, year by year the cost of storage disks keep on reducing but still the enterprise storage cost keep on increasing year by year, the problem with conventional storage costing is that even though the hardware cost keeps on reducing cost on operation and maintenance keeps the total cost increased, also it is difficult to keep with the exponential need in the storage needs. Cloud Storage Services tries to address all these problems.<br /><br /><strong>Understanding Storage in Cloud:</strong> <br />Cloud Storage operates on a base concept called Storage Virtualization. Storage Virtualization system provides a logical data store that maps over the physical storage system through a mapping table. <br />Storage Virtualization in general achieves the following<br />1. Location independence – Abstracts the physical location and thus enables data movement across different physical locations.<br />2. Replication – Enables replication of the storage data across multiple locations<br />3. Data migration – Enables movement of storage data to a faster / better infrastructure if needed.<br />4. Dynamic scaling - Enables to scale the capacity of the storage space when needed<br /><br /><strong>Storage Services in Amazon and Azure</strong><br /> Amazon, Azure the top public cloud computing service providers provides services in Storage segment. Both of them provide similar type of services in storage segment. These storage services can be accessed by a REST based API or web service API calls.<br /><br /><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiDi0SMgdU_cnXLPM4uWwxqayxTlGEbaSHUwbp2sOcghd4_IDpM0TSMrT1vW5IMZFe79yvgAz4tJR4kssP9Xewp8j2pvRcMutkWsrQkO6tA0oM92PKFBVU-wgimCg4WndpEFmcCVdkFEzl/s1600/storage1.JPG"><img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 95px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiiDi0SMgdU_cnXLPM4uWwxqayxTlGEbaSHUwbp2sOcghd4_IDpM0TSMrT1vW5IMZFe79yvgAz4tJR4kssP9Xewp8j2pvRcMutkWsrQkO6tA0oM92PKFBVU-wgimCg4WndpEFmcCVdkFEzl/s320/storage1.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5537090070248438098" /></a><br /> <br />Let us try to compare the cost of these storage services by these vendors. Generally the cost of these services will vary based on geographic location and also will be revised (generally reduced), the costing I am mentioning is as of today. <br /><br /><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxghmn3Y_z0D4z3MVaigz72EpTYWuq3X2LMuEuv9WWHI4PvDcBD6FXtfZEAPg51BB_5E1LXR5qWjjGdz2ClLbFLJiABADEiOAmREW-eb513jvhwATM0zi-gU2dzaxG9fwKaWmc61fb8xGq/s1600/storage2.JPG"><img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 320px; height: 77px;" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxghmn3Y_z0D4z3MVaigz72EpTYWuq3X2LMuEuv9WWHI4PvDcBD6FXtfZEAPg51BB_5E1LXR5qWjjGdz2ClLbFLJiABADEiOAmREW-eb513jvhwATM0zi-gU2dzaxG9fwKaWmc61fb8xGq/s320/storage2.JPG" border="0" alt=""id="BLOGGER_PHOTO_ID_5537090300633625266" /></a><br />(Please note that billing fees are subject to change.)<br /><br />Please refer to the following links for the detailed pricing<br /><a href="http://aws.amazon.com/s3/#pricing">http://aws.amazon.com/s3/#pricing</a><br /><a href="http://www.microsoft.com/windowsazure/pricing/">http://www.microsoft.com/windowsazure/pricing/</a><br /><br /><strong>Security options</strong> <br />Security in Data Transition:<br /> Security in data transition can be achieved by means of secured http channel.<br /> Security in Data Source:<br /> Highly sensitive data that needs to be secured at the source can be achieved by means of data encryptions.<br /> Security in Access:<br /> Cloud providers are coming up with Authentication, Authorization mechanism by which access to these resources can be secured.<br /> Security in Virtualization:<br /> Virtual Servers in the same physical servers are properly secured by means of virtual firewall by the cloud providers and hence data is kept secured between virtual servers on same physical server.<br /> <br /><strong>Best Practices</strong><br />1. Choose the Cloud Storage Data centre location closer to the end user<br />2. Segregate the data into different buckets(Amazon) or Containers(Azure) so that different level of security access can be achieved<br />3. Partition the data properly to achieve higher throughput and efficiency.<br /><br /><strong>CDN Integration</strong><br /> Both Amazon and Azure provides Content Delivery Network (CDN) that can be integrated with their storage services to provide closer delivery of data to the clients with higher performance and better reliability.<br /><br /><strong>Tools</strong> <br />There are few cloud storage explorer management tools that are available that facilitates a user to view the data on cloud storage<br />Cloudberry Explorer - <a href="http://cloudberrylab.com/">http://cloudberrylab.com/</a><br />Explorer Tools: S3Fox, BucketExplorer, awszone.com<br />Azure Storage Explorer - <a href="http://www.cerebrata.com/Blog/file.axd?file=2009%2F10%2Fcomparing_azure_storage_management_tools.pdf">http://www.cerebrata.com/Blog/file.axd?file=2009%2F10%2Fcomparing_azure_storage_management_tools.pdf</a><br />Azure Storage Manager - <a href="http://azurestoragemanager.codeplex.com/">http://azurestoragemanager.codeplex.com/</a><br /><br /><strong>Other Cloud Storage Providers in the market:</strong><br />Nirvanix - <a href="http://www.nirvanix.com/">http://www.nirvanix.com/</a><br />EMS Automos - <a href="http://www.atmosonline.com/">http://www.atmosonline.com/</a>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com1tag:blogger.com,1999:blog-9141828500783422104.post-52271100946588899002010-10-19T01:16:00.000-07:002010-10-19T01:19:09.894-07:00A2A ‘Cloud Comparison’ – Compute Service<strong>A2A ‘Cloud Comparison’ – Compute Service</strong><br /><br />As many of us know Amazon and Azure are among the major providers in the public cloud service space. This will be series of blogs depicting my views on comparing Amazon to Azure (A 2 A) on public cloud services on various dimensions of their services like Compute, Storage, Bandwidth, Pricing, Security, DB Services, and CDN etc. Please tag to this space to follow closely on the series.<br />In this current blog I have taken compute service offering from both of these providers and provided their features as per my knowledge.<br /><br /><strong>Amazon EC2 Compute Instances: </strong><br />Amazon provides services in Infrastructure services space where in compute instances it provides compute services in terms of virtual servers, the compute instances so called EC2 (Elastic Compute) instances provide different flavours in terms of hardware configuration and software configuration, some of the flavours in hardware are Micro, Small, Large, XLarge, High CPU, High Memory etc., you can find more details of it here at <a href="http://aws.amazon.com/ec2/instance-types/">http://aws.amazon.com/ec2/instance-types/</a> , the costing of the instances varies based on the flavour. Each instance flavour can differ in terms of hardware configuration and software configuration. Amazon as a provider provides instance for some predefined software like Windows Server 2003, 2008, SQL Server editions, RHEL etc. In addition amazon has partnership with major vendors like IBM, Oracle and provides pre built alliances, for example you can have a prebuilt appliance with oracle 11g with different hardware configuration provided by Oracle, and similarly for IBM you have appliances provided by them.<br />Details of the partnership EC2 instances for IBM and Oracle are available under<br /><a href="http://aws.amazon.com/solutions/global-solution-providers/oracle/">http://aws.amazon.com/solutions/global-solution-providers/oracle/</a><br /><a href="http://aws.amazon.com/solutions/global-solution-providers/ibm/">http://aws.amazon.com/solutions/global-solution-providers/ibm/</a><br /><br />Some of the benefits you can find with Amazon Compute instances are<br />1. Prebuilt appliance and save your time and avoid expertise from setting up with proper environment<br />2. Some of the instances through partnership comes as Pay as you use model and hence avoids licensing and costing issues, suppose you want to test or do some POC with IBM Web sphere Portal Server for a week or even a day you can very well find the instance and use it with amazon ec2 instance in no matter of time.<br />3. Many software vendors started providing their products through Amazon instances with the correct environment set, this way it becomes easy for customers to try out any software of their interest with less turnaround time.<br />4. Start and terminate instances when ever needed and pay only for used time.<br />5. Set firewall and other security for the instances as you need.<br />6. Ability to monitor the health status of the instances. <br />7. Easy to migrate existing applications with same flavour on the cloud platform.<br /><br /><strong>Azure Compute instances:</strong><br />Microsoft Azure as we know operates in Platform Services layer, in the sense user won’t be exposed to the server directly, but when it hosts the application it provides a virtual server for running the application. Similar to Amazon Azure also provides some option on the virtual server configuration like Small, Medium, Large, Extra Large etc. Details of the instance can be found at <a href="http://www.microsoft.com/windowsazure/windowsazure/default.aspx">http://www.microsoft.com/windowsazure/windowsazure/default.aspx</a> With respect to operational model Azure provides compute instances in two different flavours as web role and worker role. Web role instances are used when the applications needs front end handlers handled by IIS web server and worker roles are used when the application needs a back end handling process ex: a batch job application or a windows service application.<br /><br />Benefits of Azure Compute Instances:<br />1. Instances are self health monitored by Azure Fabric <br />2. Auto scaling can be enabled on the instances.<br />3. Control security policy over the instance.<br />4. Easy to build and migrate applications based on IIS7 and ASP.Net<br />5. Development fabric on Windows Azure SDK provides a simulated environment for service deployments and role instances on local machine.<br /><br />To make a comparison study on Amazon and Azure with respect to compute instances<br />1. Amazon Compute instances are at infrastructure level and hence have more control over the instances, while for Azure Compute the control is limited as it provides platform services. In Azure some of the overhead like application monitoring , high availability it taken care by Azure fabric where in Amazon we have integrate few Amazon services to achieve it manually.<br />2. Azure allows deploying only one role per compute instance, where in Amazon you can deploy multiple applications / services as we do with normal servers. For example if you want to deploy a ASP.Net based application and WCF service at back end, you may need 2 compute instance in Azure, where in Amazon EC2 instances you can deploy them in same virtual machine.<br />3. Azure instances are self monitored and controlled by Fabric where in Amazon EC2 instances we add a service called cloud watch to monitor specific instances.<br />4. Azure instances will allow running applications based on Windows environment where in Amazon we can run applications based on windows and Linux environments.<br />5. Applications with some 3rd party dependencies or using commercial of the shelf products will be less suitable to migrate to Azure as the platform needs the dependencies on the Azure platform and licensing of the products needs to be worked out.<br />Microsoft is planning to release virtual servers (VMRole) in Infrastructure as a Service space similar to Amazon in near future <a href="http://blogs.msdn.com/b/usisvde/archive/2010/03/29/vm-support-in-windows-azure.aspx">http://blogs.msdn.com/b/usisvde/archive/2010/03/29/vm-support-in-windows-azure.aspx</a> , with vmrole Azure platform will gain more power and benefits for migrating Microsoft based applications to cloud.Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-59011634079891593642010-08-15T02:57:00.000-07:002010-08-15T03:02:34.701-07:00Cloud Hosting versus Web HostingI have talked about cloud services interms of infrastructure services, platform services, software services , when detailing about infrastructure services many people have this doubt in mind , how is the cloud infrastructure services ( ex : Amazon ) differ from the normal web hosting providers,if I use virtual servers from web hosting providers, how cloud hosting is different ?, i thought to provide my thoughts on that.<br /><br />Let us talk a sometime to talk about web hosting providers, there are different type of providers, with dedicated hosting provider we can rent physical servers not shared with anybody and have full control over the server including the server administration, with shared web hosting provider we can share the server space with others to be cost efficient but have less control over the server like administration , we can also procure interms of virtual servers and use.<br /><br />So what's the big deal with cloud infrastructure providers like amazon , they also provides servers in the form of virtual servers so what's the big difference and what makes amazon to be called as cloud provider?<br /><br />When we say cloud provider the main difference we have to clearly observe is the utility model of computing in all dimensions of usage , the chargeback will happen based on how much we utilize the cloud elements like compute units, bandwidth, power usage, storage etc..Also the turn around time to set up an infrastructure with amazon will be much easier and quick compared to web hosting providers.<br /><br />For example when a rent a server with a hosting provider we have to commit for their space say interms of months / years. It is not easy to dynamically expand and reduce the space depends on our need, you need to have minium commitment for specific duration. Also you have to get dedicated intenet bandwidth for our application need. Similarly we have to procure storage requirements for our need, all these needs minimum commitment with the provider. We cannot dynamically scale up and down instantly with general web hosting providers.<br /><br />Taking an example with Amazon Infrastructure Cloud hosting provider we see how the cloud elements can be used in utilization model. Amazon provides a simple web interface in the form of plugin with firefox called 'Elastic fox' with which any user having an amazon account can securely create , destroy instances , attach, detach disk , set security settings for the instances , it also exposes SDK to operate on the elements, so you can programatically operate on the elements based on your application needs. So you basically pay for what you use and scale dynamically for seasonal needs. It also assures of high availability .<br /><br />In addition to these, amazon infrastructure services provides blob storage services, Simple DB services, Amazon Relational database services, simple queue services , notification services. Using all these services in utilization model you can effectively build an architecture that use these infrastructure in a effective way to operate your app in OPEX (Operational Expense )model than in CAPEX ( Capital Expense ) model.<br /><br />Cloud platform services like Microsoft Azure , Google App engine provides more services built on top of infrastructure services and cloud software service providers like sales force provides services at higher level than platform services.<br /><br />Getting the power of cloud computing ...Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com5tag:blogger.com,1999:blog-9141828500783422104.post-47263182506927928272010-05-10T09:36:00.000-07:002010-05-10T09:39:45.673-07:00SaaS(Software as a Service) versus CloudMany of us will have this doubt in mind , what is a SaaS(Software as a Service) application?, what is cloud application? Can I say all SaaS application are cloud applications? , or the vice versa is true , how both of these applications are interrelated..<br /> Does all cloud applications provide SaaS type of service? When I develop an application say in Windows Azure or Amazon EC2, will I get the appliation in SaaS model? The answer is big No. <br /> When we define cloud computing we say SaaS as one of the services of cloud computing , what does it mean then? Let us try to understand what is SaaS ? SaaS is Software as a Service in which the application is available as a service , where in a new customer wants to use that application for his usage, he can just pay and on board as a tenant to the SaaS application , do some level of customization available and use that for his use with the specified level of data security and isolation needed, so how this SaaS type of application is related to cloud computing?<br /> Designing SaaS type of application is comparitivily difficult to design and implement because of its extensive functionalities. High availability and massive scalability are some of the basic requirements of SaaS type of applications and cloud computing techiques helps to solve the high availability and scalability in a simple way. <br /> To say in a simple way, cloud computing enables to build SaaS applications easily , SaaS enablement is achieved easily through cloud computing techniques.Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-84117361909681681032010-03-16T09:16:00.000-07:002010-03-16T09:54:52.720-07:00Cloud computing segmentscloud computing solutions can be classfied into three broad segments, solutions on private cloud or public cloud or hybrid cloud. <br /><br />Enterprise customers who already has datacenters in place that will prefer to migrate their datacenters into private cloud , also companies that faces security as top most concern and takes no excuses unless they find any proven record on using public cloud , they will prefer to take their path towards private cloud solutions. <br /><br />Companies that operates on huge amount of data and do manipulations on those data on temporary and permanent basis and want to share the data among business will prefer to go for utilizing public cloud storage and computing. For example media and entertainment companies will prefer to move towards public cloud where they can store and share huge data in widely spread public cloud.<br /><br />Companies that has some medium level data centers and want to extend to public cloud on need and for less critical applications will prefer to migrate to build hybrid solutions.. for examples corporates can build a medium level data center for high critical applications and use public cloud for less critical applications.. that way they can manage concerns on security as well as costs.<br /><br />In the fore coming I will talk about players in private cloud ,public cloud etc..Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-13949213449955944382009-11-14T09:35:00.000-08:002009-11-14T10:08:15.907-08:00Pay per second for mobile calls - inspired from cloud model ???When I was watching television these days i noticed few adds from various indian telecom service providers regarding pay per second billing scheme.. Initially a new service provider came up with the Pay for second model billing and consequently all others tend to follow to provide the same in order to catch up the competion, ok understand you are thinking why I am talking about this here now, right....<br />I was able to corelate this model of billing similar to cloud model, so i thought to put my thoughts here so that you can better clarity on the cloud model...<br />Before this 'Pay per second' model , for the mobile providers the unit of billing was say for 30 seconds , so if you take a call and complete the call in 1 second you have to pay for the whole unit of 30 seconds, the question arised why I have to pay for the remaining 29 seconds which I haven't used ?? <br />Similarly i can compare it here with the cloud model of billing earlier applications hosted on servers will have the resources in the server reserved whether the resources have been utilized or not , resources might have been utilized effectively only during peak load period , during the remaining periods it might have been under utilized ... now with the cloud model you will be paying only for the resources you have utilized and that too for that specified period only...<br />Whether these service providers got inspired by the cloud model ???? :)Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-89880754331335158682009-11-11T08:53:00.000-08:002009-11-12T08:59:53.935-08:00Why Cloud the buzz word now...In my previous post I gave some introduction to cloud computing, now i can brief why Cloud is the buzz word of the technology today. From a invester prespective one of the biggest benefit that cloud computing provides is that it reduces the Capital Expense (CapEx) .<br /><br /> I can explain it more clearly with an example suppose you had some innovative idea to develop some e-business application. You feel so confident about the application and your business analysis say that the application has to be support accessibility by around 1000 users simultaneous. You have developed the application and now you have to make it ready for 1000 simultaneous users.<br /><br />What can you do now... do some performance tuning to make the application ready, Make some capacity planning for 1000 users , procure hardware to serve that many number of users, invest huge amount to procure the hardware, ok you have done everything and the application is hosted.. What if the application didn't reach well as you expected or what if the application is used by only 100 users and not 1000 as you predicted ? The hugee money you have invested into the hardware is not utilized and you are not making money as you predicted...<br /><br />I can provide you a simple solution for this, once you have the application ready you can host the application in a environment which takes care of backing up the application , manage failover, takes care of scaling based on demand if all these can be done at a cost of less than 10 Indian Rs (.12 cents USD) per hour what do you do ? Yes the answer is cloud, currently public cloud providers like Amazon, Microsoft, Google are providing cloud environment with the cost as I have mentioned. See how much of investment risk has been reduced , how much of capital expense has been reduced.<br /><br />Also if the application is a hit and you want to scale the application you do it with a mear change in configuration file. You pay for what you use in the cloud. What else you want.. gotcha why cloud is so buzz now..Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0tag:blogger.com,1999:blog-9141828500783422104.post-49549927126981401902009-11-07T22:45:00.000-08:002009-11-07T23:41:49.277-08:00What is cloud computing ?Cloud computing is a methodology by which resources are consumed dynamically on demand over internet where the resources can be storage, memory, core and extend to infrastructure, platform , application.<br /><br />Think of the change we had when web technologies able to migrate from static to dynamic contents, i can compare this to such a revolution where as cloud computing provides ability to consume resources dynamically.<br /><br />No wonder Gartner has predicted 'Cloud Computing' as top strategic technology that most organizations will drive for during the year 2010 <a href="http://www.gartner.com/it/page.jsp?id=1210613">http://www.gartner.com/it/page.jsp?id=1210613</a><br /><div><br /></div><div>Few factors that drive cloud computing</div><div><br /></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>1. Effective utilization of resources</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>2. Capacity on demand</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>3. Pay as you use model</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>4. Green IT</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>5. Reduce expense on hardware</div>Aravindhttp://www.blogger.com/profile/06215275043383791502noreply@blogger.com0