October 20-22, 2014
University of Maryland, College Park
Big data and data science are enabled by scalable, distributed processing frameworks that allow organizations to analyze petabytes of data on large commodity clusters. MapReduce (especially the Hadoop open-source implementation) is the first, and perhaps most famous, of these frameworks. What's next? Well, Spark is (one) answer.
This is a two-and-a-half day tutorial on the distributed programming framework Apache Spark.
The class will include introductions to the many Spark features, case studies from current users, best practices for deployment and tuning, future development plans, and hands-on exercises.
In addition, there will be ample time to mingle and network with other big data and data science enthusiasts in the metro DC area.
This tutorial is being organized by Jimmy Lin and jointly hosted by the iSchool and Institute for Advanced Computer Studies at the University of Maryland. The tutorial will be led by Paco Nathan and Reza Zadeh.
The event will take place from October 20 (Monday) to 22 (Wednesday) in the Special Events Room in the McKeldin Library on the University of Maryland campus (actual room number is 6137). The tutorial will run all day Monday, all day Tuesday, and end at noon on Wednesday. The event is free for University of Maryland students and open to the general public for a nominal registration fee.
Update: we've filled up and registration is closed!
If you have any questions, feel free to contact Jimmy Lin at .
Maps of the University of Maryland can be found here. McKeldin Library is at one end of the mall that runs across the center of campus; it looks like this and it's pretty hard to miss. Take the elevators up to the 6th floor to room 6137.
To find parking on campus, check out this link. Just a warning, allow ample time getting onto campus in the morning, especially if you arrive on the hour. Students getting to classes can clog up traffic, and it's not rare to sit at an intersection for more than ten minutes waiting for students to stream by.
Yes, we will be providing wireless access and coffee, probably the two most important ingredients to a successful technology tutorial. The power outlet situation, however, is a bit iffy. The room we are in does not have outlets at the seats, although there are outlets along the walls. Make sure your laptop is charged! Also, if you have a power strip conveniently lying around, please bring so we can share...
The tutorial will start at 10am sharp, but doors open at 9am... we'll be around, and you're welcome to stop by and mingle.
The hashtag for the event is #fearthespark.
The first two days of the tutorial will be presented at the level of a CS freshman. We expect the attendee to have some programming experience in Python, Java, or Scala.
Throughout the class, there will be hands-on exercises. You are expected to bring your own laptop for those, with the minimum system requirements:
If you're eager to get started, look through resources here.
The third (half day) of the tutorial will be presented at the level of a CS graduate student, focusing specifically on research on or with Spark.
Full house at the UMD Spark Tutorial!