De-Identified Personal Health Care System Using Hadoop
The use of medical Big Data is increasingly popular in health care services and clinical research. The biggest challenges in health care centers are the huge amount of data flows into the systems daily. Crunching this BigData and de-identifying it in a traditional data mining tools had problems.
Therefore to provide solution to the de-identifying personal health information, Map Reduce application uses jar files which contain a combination of MR code and PIG queries. This application also uses advanced mechanism of using UDF (User Data File) which is used to protect the health care dataset.
Responsibilities:
Moved all personal health care data from database to HDFS for further processing.
Developed the Sqoop scripts in order to make the interaction between Hive and MySQL Database
Wrote MapReduce code for DE-Identifying data.
Loaded the processed results into Hive tables.
Generated test cases using MRunit.
Best-Buy – Rehosting of Web Intelligence project
The purpose of the project is to store terabytes of log information generated by the ecommerce website and extract meaning information out of it. The solution is based on the open source Big Data s/w Hadoop .The data will be stored in Hadoop file system and processed using PIG scripts. Which intern includes getting the raw html data from the websites, Process the html to obtain product and pricing information, Extract various reports out of the product pricing information and Export the information for further processing.
This project is mainly for the re platforming of the current existing system which is running on Web Harvest a third party JAR and in MySQL DB to a new cloud solution technology called Hadoop which can able to process large date sets (i.e. Tera bytes and Peta bytes of data) in order to meet the client requirements with the increasing completion from his retailers.
Responsibilities:
Moved all crawl data flat files generated from various retailers to HDFS for further processing.
Written the Apache PIG scripts to process the HDFS data.
Created Hive tables to store the processed results in a tabular format.
Developed the Sqoop scripts in order to make the interaction between Pig and MySQL Database.
Involved in designing, development and testing.
Writing the script files for processing data and loading to HDFS.
Writing CLI commands using HDFS.
Developed the UNIX shell scripts for creating the reports from Hive data.
Involved in the requirement analysis phase.
Questions and answers
Why should we hire you?
If you hire me, it will be a great platform to exhibit my skills. Whatever goals I set, I ensure to complete within stipulated time.
Reson to leave last job?
In order to enhance my skills, I am looking for better opportunities.
Your ability to work under pressure?
I keep myself calm and focus on multi tasking while being patient.
Describe your management style?
I constantly keep tab of my assigned work with my team mates and I ENSURE TO COMPLETE THE WORK before deadlines.