11 Final Project

11.1 Brainstorming Ideas



Eric Lind’s Metro Transit presentation (which you can find on Moodle) outlines the background for our course project. Directions:

  • Carefully review this presentation before completing the exercises below.
  • We want this brainstorming activity to provide a rich set of potential research topics! Thus you should not collaborate with others on this assignment.



  1. Identify and describe two Metro Transit related research topics that interest you. These topics should be broad (eg: “public transit for the elderly”) and your desciptions short (1-2 sentences each).

    The specifics: One of the topics should be chosen from the list of research topics / areas of interest identified in Lind’s presentation. The other should be unique - something Lind did not discuss during his visit. If you have trouble thinking of something, consider the following prompt for the second research topic: What non-transit topics interest you (eg: geographic areas, academic areas, hobbies, current events)? Try to write down many of these and find one that has a connection to transit.


  1. For both topics, quickly skim a background article (e.g. Wikipedia page) or recent news events and look for aspects of the topic that are associated with data. These could be financial information, scientific measurements, demographic or geographic data, etc. Write down as many of these as you can think of.


  1. For both topics, think about what information may be available on web pages or social media (Twitter, Facebook, etc.) about your topic. Write this down.


  1. Look over the lists of public datasets and public APIs we discussed in class. For both topics identify the sources of data that may be relevant. Write down their name and what information they contain.


  1. For both topics write down two questions you might be able to answer using data. They can be broad at this point.













11.2 Data Collection & Visualization

Each of you has been assigned to a Metro Transit project group based on your responses to a survey. Each group will work together to:

  • identify a topic of interest;
  • identify research questions related to that topic (as much as possible, no two groups should pursue the same research questions);
  • obtain data required to explore those research questions;
  • conduct a data analysis that informs the research questions; and
  • communicate findings both in a written report & public presentation.

The aim of checkpoint 1 is to simply make progress on the project goals outlined above. With this in mind, each group will hand in 1 html (knit Rmd) document for the following exercises. Throughout the exercises, keep in mind that your research topics, questions, and data might evolve throughout the course of the project. This checkpoint is simply a first step in narrowing these down.


  1. Names & workflow
    List all group members. Though you do not have to write it down here, discuss a plan for completing this assignment. How will you put together your shared Rmd? How will you divide the work? When will you meet to work on the assignment?


  1. Topic & research questions
    1. Review the set of brainstorming ideas. As a group, identify a common research topic of interest. This should be related to Metro Transit and broad (eg: “accessibility of public transit”).

    2. Specify two potential research questions that (1) are related to the topic above; and (2) you can (at least partially) answer using data. At least one of these must require data not provided by Metro Transit.


  1. Data
    Identify the data sets you will use in this assignment to explore the research questions above. At least one of these must be new (not provided by Metro Transit). For each data set, summarize the following:
    • data source;
    • data description - what’s being measured?;
    • data limitations (eg: are the data recent? do they contain all variables you might want?);
    • data dimensions - how much data do you have?


  1. Research question 1
    Restate research question 1. Similar to Activity 3.5 (Visualization Wrap-Up), construct 4 relevant visualizations that provide insight on / help answer this question and piece these together to tell a short story. 1-2 sentences per visualization suffice.


  1. Research question 2
    Repeat exercise 4 for research question 2.


  1. Contributions
    Summarize each group member’s specific contributions to this checkpoint assignment.

11.3 Progress Reports

In the previous Data Collection & Visualization checkpoint, your group narrowed the focus of your Metro Transit project, collected the data required to pursue this project, and used data visualizations to tell a story that informed your research questions. For this next assignment, you will continue to make progress on your project and present your work to the class in a 5-minute talk.

Each group will submit 1 html (knit Rmd) document and a link to 1 set of Google presentation slides.



  • Progress report
    Submit 1 html (knit Rmd) for your group that contains the following:
    1. List all group members at the top.
    2. During class we will work together to identify one to three follow-up questions from the previous assignment. List those questions here, create a data visualization that provides insight into each one. NOTE: Though the progress report does not have to be as polished as your final paper, it must still be professional. Before handing it in, make sure to:
      • Edit! Correct grammatical errors / typos, ensure that the report flows well from one idea to the next, etc.
      • Check the formatting of the knit html document. Is it easy to follow? Are there goofy formatting mistakes?
      • Hand off your report to at least one friend outside of the class and ask them to interpret your visualizations. If they cannot interpret your work, go back and make the appropriate changes. After all, the final reports will be written with a broad audience in mind.
      • Eliminate unnecessary RStudio code.
    3. Summarize each group member’s specific contributions to this assignment.



  • Presentation slides
    Use google slides. Share a link to your slides with Alicia and Shilad, and copy and paste the link to your slides in the Moodle assignment text field. Put together a set of slides that you will use in a 5-minute presentation that summarizes your group’s research topic, progress, and future directions/goals. Post your slides to Moodle. Be thoughtful about the following features when constructing these slides:
    • font size
      Your audience wants to be able to read your text. Use 28-32 point font size.
    • text vs images/visualizations/charts
      Use text sparingly - slides with a lot of text are difficult for the audience to process (and they’re boring). Instead, utilize images/visualizations/charts.
    • aesthetics
      Keep your slides simple and neat. Overloaded slides are distracting.
    • presentation length
      The 5-minute presentation limit is strict. Of course, you will not be able to present everything you’ve done in these 5 minutes. Instead, focus on the motivation behind your research questions (why are they relevant?), a small selection of plots that demonstrate progress and provide insight, and your next project goals.



  • Presentations
    Each group will give a 5 minute progress report to the class using the slides as a visual aid. Remember the following:
    • Be confident! You know more about this project than anyone else in the room.
    • Speak clearly and make eye contact with the audience.
    • Practice.
    • Remember your audience. Don’t use jargon that wouldn’t be understood by a broad audience.
    Not everybody has to talk - all students will talk at some point.

11.4 Final Presentations



Each group will put together an 8-10 minute presentation of their Comp 112 project. Each group member must actively participate in the construction of the presentation. Further, any group member that did not speak during the 5-minute progress report presentations must speak during the final presentation. Here are some other presentation guidelines:

  • audience
    You should gear your talk toward Eric Lind (Metro Transit), Dan Frankowski (Pinterest), and other data scientists that might be in attendance.


  • slides
    Effective slides will:
    • be legible;
    • be clean, engaging, and utilize effective (not distracting / unrelated) graphics, pictures, color, etc;
    • not contain a lot of text.


  • content
    • Motivative your work & specify clear research questions.
    • Explain your data resources and collection process.
    • Don’t try to present everything you’ve done throughout the project. Rather, pick the most insightful and cohesive aspects.
    • Tie your work together with a cohesive story rather than a list of things you did.
    • You, not your audience, are the experts on your topic. Thoroughly discuss and present any notation, terms, plots, etc.
    • Use data visualizations / analyses that are easy to interpret. (Not sure if this is a case? Show your visualization to at least one friend outside of the class and ask them to interpret your visualizations. If they cannot interpret your work, go back and make the appropriate changes.)
    • Don’t use RStudio notation (eg: data$height_at_10) in your text, visualizations, etc
    • Showcase your work! For example, you can highlight the sophisticated skills behind your data collection, etc.


  • delivery
    • Practice explaining your work without a script. This is key to engaging with the audience and enstilling confidence.
    • Slow down - take the time to get the audience on the same page as you.
    • Be confident & enthusiastic! You know more about your topic than anyone in the audience.







11.5 Final Reports

For the final report, each group will submit 1 html (knit Rmd) document with the following structure:

  • Title
    A descriptive title & list of all group members.

  • Introduction
    An introduction that motivates & outlines a clear, specific set of research questions.

  • Data collection
    Specification of your data sources and collection process.

  • Analysis
    A presentation of the group’s key findings and take-aways for Metro Transit.




Your reports will be graded with respect to the following:

  • Structure & layout
    The report follows the above structure and utilizes section/sub-section titles so that readers can easily navigate the report.

  • Storytelling & cohesion
    • Goals & research questions are clear.
    • Findings are woven together in a cohesive story, rather than presented as a list of distinct ideas.
    • You don’t try to present everything you’ve done throughout the project. Rather, you pick the most insightful and cohesive aspects.
    • Your report showcases the entire life cycle of your project (eg: data collection to conclusions).
  • Results
    • Your data visualizations / analyses are meaningful, ie. support the investigation of your research questions.
    • Your data visualizations / analyses are easy to interpret. (Not sure if this is a case? Hand off your report to at least one friend outside of the class and ask them to interpret your visualizations. If they cannot interpret your work, go back and make the appropriate changes.)
    • You thoroughly discuss any assumed notation / definitions and don’t use RStudio notation (eg: data$height_at_10) in your text, visualizations, etc.
  • Code
    • Code is commented and easy to follow.
    • Unnecessary code (including errors) is eliminated.
    • Minimize the amount of data that is printed to what is absolutely necessary for the reader.
  • Professionalism
    • There are no grammatical errors / typos.
    • The knit html document doesn’t contain formatting errors.

11.5.0.1 How to hand in your assignment

At least one member of the group should hand in the knit HTML to Moodle.

Every member of the group should individually describe what work each group member contributed to the final project in the text input field of their own Moodle assignment. If you or another member of the group contributed substantially more or less work overall make this clear.