Posts

Showing posts from August, 2014

Bootable Windows 7 x64 Install Flash Drive

Create a Bootable Windows 7 x64 Install Flash Drive from 32-bit Windows Creating a bootable Windows 7 x64 flash drive from within a 32-bit install of Windows is not as straight forward as it may seem. I recently had to go through this process myself, so I’ll document the steps below. Things you’ll need Windows 7 x64 disc image Windows 7 USB/DVD Download Tool 32-bit bootsect.exe Create the installer Install the  Windows 7 USB/DVD Download Tool . Extract the  32-bit bootsect.exe  file to the directory that the  Windows 7 USB/DVD Download Tool  was installed to. This is usually something like “C:\Users\username\AppData\Local\Apps\Windows 7 USB DVD Download Tool “. Run the  Windows 7 USB/DVD Download Tool  and select your Windows 7 disc image. Follow the remaining steps in this tool and your image should be created successfully! If you’ve followed these steps and your flash installer was created successfully then your next step is to, of course, install Windows 7! Do

command line arguments to Pig scripts

Parameter Placeholder First, we need to create a place holder for the parameter that needs to be replaced inside the Pig script. Let’s say you have the following line in your Pig script where you are loading an input file. INPUT = LOAD '/data/input/20130326' In the above statement, if you want to replace date part dynamically, then have to create a placeholder for it. INPUT = LOAD '/data/input/$date' Individual Parameters To pass individual parameters to the Pig script we can use the  -param  option while invoking the Pig script. So the syntax would be pig -param date=20130326 -f myfile.pig If you want to pass two parameters then you can add one more  -param  option. pig -param date=20130326 -param date2=20130426 -f myfile.pig Param File If there are lot of parameters that needs to be passed, or if we needed a more flexible way to do it, then we can place all of them in a single file and pass the file name using the  -param_file  option. The pa

Learn Apache Sqoop

Sqoop Learn Apache Sqoop Installation Apache Sqoop Tutorial Part 1 Apache Sqoop Tutorial Part 2 Apache Sqoop Tutorial Part 3 Apache Sqoop Tutorial Part 4
COURSE VIDEOS: ANALYZING BIG DATA WITH TWITTER

Microsoft Hadoop HDInsight ( Hadoop on Windows Azure)

Image
Windows Azure HDInsight Service, formerly known as Hadoop on Windows Azure,  is now available inside the Windows Azure Preview portal.  Hadoop-based big data tools are what I call the WMD(P), or Weapons of Mass Data Processing. (You heard it here first!)  This is a very exciting development, and I would like to take a moment to recognize the great work our HDInsight team has done to pull this off. Signing up 1. To sign up,  log into  your Windows Azure account at  http://www.windowsazure.com .  At the bottom of the portal, click on New (+). 2. Then click on DATA SERVICES followed by HDInsight.  Use the preview program link to navigate to sign up page. At the Preview features page, also accessible from  https://account.windowsazure.com/PreviewFeatures/ , click on try it now next to Azure HDInsight Preview. Please be warned that this might take up-to a few days. Getting Started and more Learning Content Overview presentation Check out a copy from:  

Pig UDF : Eval

Apache Pig : UDF

Apache Pig : Writing a User Defined Function (UDF) Apache Pig : Writing a User Defined Function (UDF) Preface: In this post we will write a basic/demo custom function for  Apache Pig , called as UDF (User Defined Function). Pig’s Java UDF extends functionalities of  EvalFunc . This abstract class have an abstract method  “exec”  which user needs to implement in concrete class with appropriate functionality. Problem Statement: Lets write a simple Java UDF which takes input as  Tuple  of two  DataBag  and check whether second databag(set) is subset of first databag(set). For example, Assume you have been given tuple of two databags. Each DataBag contains elements(tuples) as number. Input: Databag1 : {(10),(4),(21),(9),(50)} Databag2 : {(9),(4),(50)} Output: True Then function should return true as Databag2 is subset of Databag1. From implemetation point of view As we are extending abstract class EvalFucn, we will be implementing exec function. In this