Cross section vs Time Series Data

Dedunu
There are two types of data sets based on the time.Cross-Section DataCross section data is collected on the same point in time. There might be different variables but all of them are collected for the same period of the time. So you won't see any time column.Time-Series Data Time series data expands across periods. Same variable is recorded for different time periods.

Type of Variables

Dedunu
Quantitative VariableVariables which can be measured.Discrete Variable Countable variables are discrete variables. There is no need to be whole numbers.Continuous Variable Uncountable variables are continuous variables. This contrasts with discrete variables.Qualitative VariablesNon-numerical variable are called qualitative variables. Sometimes qualitative variables are represented by numbers. But it is useless to perform arithmetic operations on those variables. ...

Point Estimation and Interval Estimation

Dedunu
Point EstimationPoint estimate is a statistic which is inferred from sample data set. Also a closer guess to the population parameter.Interval EstimationInterval estimation describe a range which can contain the value in a population. This contrasts with Point estimation.

Apache Spark Job with Maven

Dedunu
Today, I'm going to show you how to write a sample word count application using Apache Spark. For dependency resolution and building tasks, I'm using Apache Maven. How ever, you can use SBT (Simple Build Tool). Most of the Java Developers are familiar with Maven. Hence I decided to show an example using Maven.This application is pretty much similar to ...

Getting Public Data Sets for Data Science Projects

Dedunu
All of us are interested in doing brilliant things with data sets. Most people use Twitter data streams for their projects. But there a lot of free data sets in the Internet. Today, I'm going to list down few of them. Almost all of these links, I found from a Lynda.com course called Up and Running with Public Data Sets. ...

vboxdrv setup says make not found

Dedunu
After you update kernal you need to run vboxdrv setup. But if you are trying to compile it for the first time or after removing build-essential package, you might see the below error.user@ubuntu:~$ sudo /etc/init.d/vboxdrv setup[sudo] password for user:Stopping VirtualBox kernel modules ...done.Recompiling VirtualBox kernel modules ...failed! (Look at /var/log/vbox-install.log to find out what went wrong)user@ubuntu:~$ cat /var/log/vbox-install.log/usr/share/virtualbox/src/vboxhost/build_in_tmp: 62: /usr/sh

How to create an EMR cluster using Boto3?

Dedunu
I wrote a blog post about Boto2 and EMR clusters few months ago. Today I'm going to show how to create EMR clusters using Boto3. Boto3 documentation is available at https://boto3.readthedocs.org/en/latest/.

HDFS - How to recover corrupted HDFS metadata in Hadoop 1.2.X?

Dedunu
You might have Hadoop in your production. And sometimes Tera-bytes of data is residing in Hadoop. HDFS metadata can get corrupted. Namenode won't start in such cases. When you check Namenode logs you might see exceptions.ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFSE

How to fix InsecurePlatformWarning on Ubuntu?

Dedunu
Python modules sometimes give issues. We got below warning from a python application./usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning. InsecurePlatformWarningAfter a small research we found out, this

Vagrant on Windows 7 vs Ubuntu 14.04

Dedunu
My whole team had to work on a project which is using Vagrant. Most of us had 8GB memory except one unfortunate intern. He had only 4GB of memory on his workstation. All the team members could spawn Vagrant machines without a problem except him.So we requested for more memory. Insisted IT department to upgrade it to 8GB. Oh no! ...

How to specify ReleaseLabel for EMR cluster with Boto2

Dedunu
Boto is the AWS SDK for Python. You can create clusters, instances or anything using Boto. But sometimes Boto imposes limitations. I wanted to create a EMR cluster with RelaseLabel 4.2.0. But we were using Boto2. ReleaseLabel is an option in Boto3. For Boto2 there was no documented option for RelaseLabel.So I found out a way to create EMR (Elastic ...

List all the links from RSS link using Python

Dedunu
Blogs and news sites use RSS(Rich Site Summary) feeds. Python can be used to fetch updates. I have written a simple program which can fetch RSS feed and print links.I have written the same application in both Python 2.7 and Python 3 both.In Python 3, urllib2.urlopen() is replaced with urllib.request.urlopen(). Python 3 code is mentioned below.

Remove all the followers from your twitter account

Dedunu
You might use social networks more often. All of us know that it is really hard to do bulk operations in Facebook and Twitter.I wanted to remove all the followers from my twitter account. So I googled it. Then I found there is no way to remove followers. Only way is blocking them and unblocking them. But this way if ...

How to fix Incompatible clusterIDS in Hadoop?

Dedunu
When you are installing and trying to setup your Hadoop cluster you might face a issue like below.FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid unassigned) service to master/192.168.1.1:9000. Exiting. java.io.IOException: Incompatible clusterIDs in /home/hadoop/hadoop/data: namenode clusterID = CID-68a4c0d2-5524-486e-8bc9-e1fc3c5c2e29; datanode clusterID = CID-c6c3e9e5-be1c-4a3f-a4b2-bb9441a989c5I just quoted first two line of the error. But full

Hadoop MultipleInputs Example

Dedunu
Let's assume you are working for ABC Group. And they have ABC America airline, ABM Mobile, ABC Money and ABC hotel blah blah. ABC this and that. So you got multiple data sources. They have different types/columns. So you can't run single Hadoop Job on all the data.You got several data files from all these businesses. (Edited this data file ...

IMAP Java Test program and JMeter Script

Dedunu
One of my colleagues wanted to write a JMeter script to test IMAP. But that code failed. So I also got involved in that. JMeter BeanShell uses Java in the backend. First I tried with a Maven project. Finally I could write a code to list the IMAP folders. Java implementation is shown below. Then we wrote a code to ...

Increase memory and CPUs on Vagrant Virtual Machines

Dedunu
Last post I showed how to create multiple nodes in a single Vagrant project. Usually "ubuntu/trusty64" box comes with 500MB. For some developers need more RAM, more CPUs. From this post I'm going to show how to increase the memory and number of CPUs in a vagrant project. Run below commands mkdir testProject1cd testProject1vagrant initThen edit the Vagrant file like ...

Multiple nodes on Vagrant

Dedunu
Recently I started working with Vagrant. Vagrant is a good tool that you can use for development. From this post I'm going to explain how to create multiple nodes on Vagrant project. mkdir testProjectcd testProjectvagrant initIf you run above commands, it will create a Vagrant project for you. Now we have to do changes to the vagrant file. Your initial ...

Alfresco 5.0.1 Document Preview doesn't work on Ubuntu?

Dedunu
I recently installed Alfresco for testing in vagrant instance. I used Ubuntu image for the vagrant instance. But I forgot to install all the libraries which is necessary to be installed on Ubuntu before you install alfresco. But fortunately alfresco worked with out those dependencies.http://docs.alfresco.com/5.0/concepts/install-lolibfiles.htmlAbove link gives you what are the libraries you should install before you install Alfresco. You ...

Yosemite Full Screen problem :(

Dedunu
People hate Yosemite. But I don't know why. I like Yosemite more than Mavericks. But Yosemite has a problem with Maximize button.(Zoom) As you click Zoom or maximize button it will go to full screen mode.To avoid thispress zoom while you are pressing Alt (key).Enjoy yosemite!