LIRNEasia collaborates with University of Dhaka Data and Design lab to perform policy-related research. A research workshop was held at the University of Dhaka. Apache Spark and Hadoop sessions were done by me. Research Team Sriganesh sharing the experience in policy-related research Apache Hadoop Hands-on session Hadoop!!! If you are interested in Big Data Research opportunities, please check this link ...
LIRNEasia collaborates with University of Dhaka Data and Design lab to perform policy-related research. A research workshop was held at the University of Dhaka. Apache Spark and Hadoop sessions were done by me. Research Team Sriganesh sharing the experience in policy-related research Apache Hadoop Hands-on session Hadoop!!! If you are interested in Big Data Research opportunities, please check this link ...
There are two types of data sets based on the time.Cross-Section DataCross section data is collected on the same point in time. There might be different variables but all of them are collected for the same period of the time. So you won't see any time column.Time-Series Data Time series data expands across periods. Same variable is recorded for different time periods.
Quantitative VariableVariables which can be measured.Discrete Variable Countable variables are discrete variables. There is no need to be whole numbers.Continuous Variable Uncountable variables are continuous variables. This contrasts with discrete variables.Qualitative VariablesNon-numerical variable are called qualitative variables. Sometimes qualitative variables are represented by numbers. But it is useless to perform arithmetic operations on those variables. ...
Point EstimationPoint estimate is a statistic which is inferred from sample data set. Also a closer guess to the population parameter.Interval EstimationInterval estimation describe a range which can contain the value in a population. This contrasts with Point estimation.
Today, I’m going to show you how to write a sample word count application using Apache Spark. For dependency resolution and building tasks, I’m using Apache Maven. However, you can use the SBT (Simple Build Tool). Most of the Java Developers are familiar with Maven. Hence I decided to show an example using Maven. This application is pretty much similar ...
Today, I'm going to show you how to write a sample word count application using Apache Spark. For dependency resolution and building tasks, I'm using Apache Maven. How ever, you can use SBT (Simple Build Tool). Most of the Java Developers are familiar with Maven. Hence I decided to show an example using Maven.This application is pretty much similar to ...
Today, I’m going to show you how to write a sample word count application using Apache Spark. For dependency resolution and building tasks, I’m using Apache Maven. However, you can use the SBT (Simple Build Tool). Most of the Java Developers are familiar with Maven. Hence I decided to show an example using Maven. This application is pretty much similar ...
All of us are interested in doing brilliant things with data sets. Most people use Twitter data streams for their projects. But there a lot of free data sets in the Internet. Today, I'm going to list down few of them. Almost all of these links, I found from a Lynda.com course called Up and Running with Public Data Sets. ...
All of us are interested in doing brilliant things with data sets. Most people use Twitter data streams for their projects. But there a lot of free data sets on the Internet. Today, I’m going to list down a few of them. Almost all of these links, I found from a Lynda.com course called Up and Running with Public Data ...
All of us are interested in doing brilliant things with data sets. Most people use Twitter data streams for their projects. But there a lot of free data sets on the Internet. Today, I’m going to list down a few of them. Almost all of these links, I found from a Lynda.com course called Up and Running with Public Data ...
After you update the kernel you need to run vboxdrv setup. But if you are trying to compile it for the first time or after removing the build-essential package, you might see the below error. $ sudo /etc/init.d/vboxdrv setup [sudo] password for user: Stopping VirtualBox kernel modules ...done. Recompiling VirtualBox kernel modules ...failed! (Look at /var/log/vbox-install.log to find out what ...
After you update the kernel you need to run vboxdrv setup. But if you are trying to compile it for the first time or after removing the build-essential package, you might see the below error. $ sudo /etc/init.d/vboxdrv setup [sudo] password for user: Stopping VirtualBox kernel modules ...done. Recompiling VirtualBox kernel modules ...failed! (Look at /var/log/vbox-install.log to find out what ...
After you update kernal you need to run vboxdrv setup. But if you are trying to compile it for the first time or after removing build-essential package, you might see the below error.user@ubuntu:~$ sudo /etc/init.d/vboxdrv setup[sudo] password for user:Stopping VirtualBox kernel modules ...done.Recompiling VirtualBox kernel modules ...failed! (Look at /var/log/vbox-install.log to find out what went wrong)user@ubuntu:~$ cat /var/log/vbox-install.log/usr/share/virtualbox/src/vboxhost/build_in_tmp: 62: /usr/sh
I wrote a blog post about Boto2 and EMR clusters a few months ago. Today I’m going to show how to create EMR clusters using Boto3. Boto3 documentation is available at here.
I wrote a blog post about Boto2 and EMR clusters few months ago. Today I'm going to show how to create EMR clusters using Boto3. Boto3 documentation is available at https://boto3.readthedocs.org/en/latest/.
I wrote a blog post about Boto2 and EMR clusters a few months ago. Today I’m going to show how to create EMR clusters using Boto3. Boto3 documentation is available at here.
You might have Hadoop in your production. And sometimes Tera-bytes of data is residing in Hadoop. HDFS metadata can get corrupted. Namenode won't start in such cases. When you check Namenode logs you might see exceptions.ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFSE
You might have Hadoop in your production. And sometimes Tera-bytes of data is residing in Hadoop. HDFS metadata can get corrupted. Namenode won’t start in such cases. When you check Namenode logs you might see exceptions. ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFS
You might have Hadoop in your production. And sometimes Tera-bytes of data is residing in Hadoop. HDFS metadata can get corrupted. Namenode won’t start in such cases. When you check Namenode logs you might see exceptions. ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:178) at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106) at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90) at org.apache.hadoop.dfs.FSEditLog.loadFS