Back to Big Data Integration and Processing

Big Data Integration and Processing

At the end of the course, you will be able to: *Retrieve data from example database and big data management systems *Describe the connections between data management operations and the big data processing patterns needed to utilize them in large-scale analytical applications *Identify when a big data problem needs data integration *Execute simple big data integration and processing on Hadoop and Spark platforms This course is for those new to data science. Completion of Intro to Big Data is recommended. No prior programming experience is needed, although the ability to install applications and utilize a virtual machine is necessary to complete the hands-on assignments. Refer to the specialization technical requirements for complete hardware and software specifications. Hardware Requirements: (A) Quad Core Processor (VT-x or AMD-V support recommended), 64-bit; (B) 8 GB RAM; (C) 20 GB disk free. How to find your hardware information: (Windows): Open System by clicking the Start button, right-clicking Computer, and then clicking Properties; (Mac): Open Overview by clicking on the Apple menu and clicking “About This Mac.” Most computers with 8 GB RAM purchased in the last 3 years will meet the minimum requirements.You will need a high speed internet connection because you will be downloading files up to 4 Gb in size. Software Requirements: This course relies on several open-source software tools, including Apache Hadoop. All required software can be downloaded and installed free of charge (except for data charges from your internet provider). Software requirements include: Windows 7+, Mac OS X 10.10+, Ubuntu 14.04+ or CentOS 6+ VirtualBox 5+.

Status: Database Management Systems

Status: Analytics

BeginnerCourse17 hours

Featured reviews

4.0Reviewed Oct 8, 2017

Lot of new information, excellent delivery. Given 4 as I feel real-use case flavor is inadequate -exercises could be more intensive, real case studies can be added.

4.0Reviewed Sep 27, 2017

Good coverage of the issues. Splunk exercise smelled like a promotional thing, not very useful. I think the exercises should be harder and on more complex examples.

4.0Reviewed Dec 3, 2016

Some of the lectures seemed slightly lesser quality with regards to the materials. For moocs especially, I would like to have the lecture better documented in order to download and review later.

4.0Reviewed Mar 22, 2020

There is little instruction for the final task (either for the other tasks). And I'm confused by the comments in the jupyter for a long time. You have to google many things to complete the task.

5.0Reviewed Mar 5, 2018

It was a good course, it could have been better if some examples of Spark were also provided in other Languages like Java, people without having background of python may find it difficult.

5.0Reviewed Oct 21, 2020

Hello Gentlemen,This course was very helpful foe me. It enhanced my knowledge about Big Data Integration. Thank you so much for providing me such important knowledge. Thank you once again.

4.0Reviewed Jun 22, 2018

Its a bit tough for students having less knowledge in programming to go through Week 6. Kindly revise the Study material to help learners cope up with tha challenges of Week 6 Quizzes

5.0Reviewed Oct 7, 2017

Very Interactive course. Theatrical classes are nicely drafted. Hands On exercises are interesting and some are challenging too. Overall very interesting course. Happy learning

4.0Reviewed Feb 20, 2019

The quiz was a bit difficult since there was no much guidance on how to sort in descending order and how to find the total times a country was mentioned in a single tweet.

4.0Reviewed Jul 22, 2018

Great course and content; I just would like that I've felt better prepared for the last challenge with Spark. @ week6. But still an excellent course and hands-on exercises.

4.0Reviewed Aug 4, 2017

I was a beginner in the area of Big Data and I selected this course due to my interest and curiosity. But the lectures were very clear and I could successfully complete the course.

5.0Reviewed Oct 20, 2020

this course is great. each material is taught in great detail from the video explanation, also accompanied by material document slides, and there are many quizzes for...

All reviews

Showing: 20 of 514

Scott Monson

3.0

Reviewed Mar 19, 2018

I found the 6th week of this course to be frustrating. There was a big jump from the lessons to the final 2 tests, and the questions and directions were not well worded, a bit confusing. The biggest issue was that many students had the same questions that were blocking their progress, however there were almost no replies from teachers or staff to give some guidance, tips, etc. Some of these questions were asked over a year ago and the new students had the same questions again, and still no real activity from the teachers in the forums. In past classes that always happened.

Jason Ross

2.0

Reviewed Oct 21, 2017

This course continued the trend of this specialization where the lectures are full of vague jargon/diagrams and name-dropping of various applications without teaching us practical skills and then quizzing us on whether we listened to the video verbatim as opposed to challenging our minds conceptually. Only the exercises are redeeming in giving some useful, hands-on experience with some applications but then the final project required extensive googling to figure out how to work with pyspark dataframes that weren't taught in the course. Instead this course should have just been full of hands-on teaching of pyspark, mongodb, and python. Also the splunk module was a total tangent/distraction and should be dropped.

Dhanu Saputra

1.0

Reviewed Apr 21, 2019

instalation for pyspark is not working properly

Dwayne D.

4.0

Reviewed Jul 27, 2018

This course provides a good overview and positioning of relevant big data technologies. In the latter weeks, the hands-on exercises become increasingly challenging, which is a good thing. I have a much better grasp of Apache Spark and its role in big data processing and integration as a result of this course.

My only significant complaint (and why I rated 4 stars vs. 5) is that the setup instructions for the environment needed for the hands-on exercises needs to be updated. I spent 1.5 days (in terms of time I allocate to continued learning) struggling with configuration of the final exercise. The forum was useful. It appears most learners who take this course after June 2018 will run into the same issue. The course administrators should update the instructions so that others won't lose time or, worse, give up on the course because of issues indirectly (at best) related to the learning objectives.

All in all, this is a very good course; and I'd recommend it.

Alireza Alex Bani

2.0

Reviewed Oct 21, 2017

Lecture material and instructions are very limited and confusing. There are so many places that the order of the steps to perform certain tasks are flipped making the students spending several none sense hours. I wish somebody would care and review the material and fix all these issues!!!

Sylvain Ogier

5.0

Reviewed Apr 2, 2020

As i am not familiar with the VM and its environment, I spent more time struggling with the VM paths, initialization even with the pre command sets than doing the computation of the data.

Phillip Maddux

2.0

Reviewed Sep 3, 2017

I have to give this course a low rating, simply because the week 6 assignment "Analysis using Spark" was a terrible experience. All other assignments throughout these course have been great, but the "Analysis using Spark" assignment was poorly constructed. Essentially the assignment could not be completed as prescribed in the instructions. The data required modifying in order to complete the exercise - which I was never able to complete. The goal of the exercise was to use what we learned from the lessons and work with data frames, not deal with and repair broken csv data. This was extremely disappointing!

harouna moumouni komoye

4.0

Reviewed Sep 12, 2020

the content of the cours is good but the cloudera vm not work properly please try the fix these bugs on cloudera vm many things don't work correctly like mogodb an pyspark thank you

Yuri Campbell

1.0

Reviewed Mar 3, 2022

I really liked the instructors. They have a really good way of showing the content and are very helpful. However, none of the examples in this whole course will work. The images used in the virtual machines are so old that they cannot be even updated anymore, because the servers changed and/or the linux distribution is not even developed anymore. Coursera has to immediately update this course and all the courses in this specialization! This is not fair with the student. Please, Coursera! I have always supported the work on the the platform. This was a great disappointment.

Ali Adnan

5.0

Reviewed Mar 6, 2018

It was a good course, it could have been better if some examples of Spark were also provided in other Languages like Java, people without having background of python may find it difficult.

Sam Mallisetti

4.0

Reviewed Feb 18, 2018

The final assignment contained concepts that were not taught in the course: for example, how to remove leading space from a field, how to put 2 words in a tuple, how to filter lines/texts with null, how to deal with country names with more than one word (e.g. United States), etc. The final assignment requirement far more advanced Spark programming skill than what was taught in the rest of the course.

Brian Moore

3.0

Reviewed Oct 13, 2018

The lectures are so mundane and abstract. I enjoy the hands on portions, but they are so disconnected from the lectures when concepts are to be explained, that they ultimately feel like we're just executing code from the material that we may not fully understand. By the time the quizzes come up, we're expected to put everything together as if we're regularly practicing the methods used in the hands-on portions. Very disappointed.

Manoj Dhoke

3.0

Reviewed Oct 29, 2018

The course content is good, however, main issue is with the hands on and assignments instructions - they are not completely clear and lack many things.

José Antonio Ribeiro Neto

5.0

Reviewed Oct 12, 2017

Rate this Course

My name is Jose Antonio. I am looking for a new Data Scientist career (https://www.linkedin.com/in/joseantonio11)

I did this specialization to get new knowledge about Big Data and better understand the technology and your practical applications.

The course was excellent and the classes well taught by teachers.

Congratulations to Coursera team and Instructors.

Regards.

Jose Antonio.

Sagar

5.0

Reviewed Dec 12, 2019

Mainly I have learned the big data structure and the technologies which are used to control the flow of the data. Practical explanation was really good. This course also given basic idea about the machine learning Algorithms which are used in big data processing such as classification, clustering... etc.I really enjoyed the learning journey..:)

Deleted Account

5.0

Reviewed Nov 28, 2017

I thoroughly enjoyed doing the course, at the end of the course doing the last exercise provided many challenges which are encountered in real time job, the helpful posts from mentors , teaching staff and other fellow students gave me invaluable insight. Would expect more fro Coursera on Apache Spark and NOSQL database courses

Chinmaya barik

5.0

Reviewed Jun 19, 2017

This course gave me a clear idea about how such humongous data are collected from various sources and processed in Bigdata platform, also this course covers some very good tools to practice.Also lectures are excellent in explaining concepts.Overall its a very good experience in taking this course.Thank You

humza tufail

5.0

Reviewed Dec 5, 2016

wonderful course but information much more condense for beginners for this course is hard because lots of error and even no solution on cloudera community blog. first experience with Mongodb after this course i am able to say that yes i have knowledge how to work with mongo and spark at beginners level.

Basil Chua

5.0

Reviewed Feb 11, 2020

The last 2 assignments were really challenging. On hindsight, it has provided a holistic view on the use of MongoDB and Spark in ingesting, processing and transforming data. It required real perseverance in researching and trying out "theories". It was enriching and rewarding but not for fainthearted.

Sukanta Mondal

5.0

Reviewed Nov 15, 2017

I was looking for this type of course of BigData. I have spent hours to read through different blogs and articles. But couldn't get better idea/direction how to start or where to start. This is ideal course for getting started on Big Data. I enjoyed all the slides and hands on very much. Thank you.