Final Project

For this project, you will be using a dataset from the TidyTuesday project. You can choose any dataset you want from the datasets released in 2024 as part of this project: https://github.com/rfordatascience/tidytuesday/tree/master/data/2024#readme. Your task for the project is to come up with two questions to answer, answer them with data visualizations, and write-up and present your method and findings. You will also get to peer review others’ projects.

Dataset

Choosing a dataset is something you should do carefully but also relatively quickly (during the first lab session that you’re working on the project).

The dataset you choose should have some numerical and some categorical variables or you should be able to recode some of the existing variables so that you can ultimately have both numerical and categorical variables to work with.

It is also very important that the dataset you choose allows for two distinct questions to be asked and answered using a not-completely-overlapping set of variables, i.e., Question 1 requires the use of variables x, y, and z and Question 2 requires the use of variables a, b, c, and d or x, a, and b. Some shared variables are ok, but the set of variables should not be completely overlapping, i.e., Question 2 can’t also require the use of variables x, y, and z.

Questions

Each of the two questions you come up with should involve more than two variables two answer. You should phrase them in a way that the is within the scope of inference of your data. For example, if you have an observational dataset, you shouldn’t phrase your question in a causal way.

Workflow

  • Week 1 of project (week of May 13th): Choose a dataset and begin writing your proposal in proposal.qmd.

  • Week 2 of project (week of May 20): Continue working on your proposal in proposal.qmd.

  • Week 3 of project (week of May 27): Provide peer review to two other teams in the form of issues in their GitHub repos, address the issues left on your team’s project repo by closing them with explicit commits. Feel free to get started on your presentation as well.

  • Week 4 of project (week of June 3): Review feedback from me on your proposal and close any remaining issues. Work on your presentation in presentation.qmd and write-up in index.qmd.

  • Week 5 of project (weeks of June 3, 10): Work on project

  • Week 6 of project (week of June 17): Finalize your write-up, presentation, and your project website. Record presentation for class by Monday (June 24th).

Due dates

  • Proposals for peer review: due on Wednesday, May 29 at 5pm.
  • Revised proposals for teaching team review: due Monday, June 03 at 5pm.
  • Write-up and presentation: due Monday, June 24th at 5pm.

Deliverables

Proposal

Your proposal should include:

  • A brief description of your dataset including its provenance, dimensions, etc. (Make sure to load the data and use inline code for some of this information.)

  • The reason why you chose this dataset.

  • The two questions you want to answer.

  • A plan for answering each of the questions including the variables involved, variables to be created (if any), external data to be merged in (if any).

  • A weekly “plan of attack” outlining your steps to complete your project and including the team member(s) assigned to that task.

    • This should be in the following form:

      Task Name Status Assignee Due Priority Summary
      Low
      Moderate
      High
    • Note that this is a living document and should be updated at minimum once a week.

Peer review

Reviewer tasks

Each team will review the proposals of two other teams. The peer review must be completed on Wednesday, May 29. On that day teams will have access to the project repos of the two teams whose work they’re reviewing. Reviews should start by cloning the team’s repo, re-rendering it locally to make sure you can reproduce their work, and then adding an issue to their repo with your peer review feedback.

The reviewer / reviewee assignments are as follows:

reviewer reviewee_1 reviewee_2
lestrada jrflemi5 rahathelal
jrflemi5 rahathelal josholmes
rahathelal josholmes VeraJSolo
josholmes VeraJSolo cwjohn13
VeraJSolo cwjohn13 ultraviolets369
cwjohn13 ultraviolets369 CristinaUAMS
ultraviolets369 CristinaUAMS manasppanse
CristinaUAMS manasppanse derekriceX
manasppanse derekriceX iriscys
derekriceX iriscys dwiesneraz
iriscys dwiesneraz Data Duo
dwiesneraz Data Duo lestrada
Data Duo lestrada jrflemi5

Teams will develop the review together, with discussion among all team members, but only one team member will submit it as an issue on the project repo. To do so, go to the Issues tab, click on the green New issue button on the top right, and then click on the green Get started button for the issue template titled Peer review.

This will start a new issue with a peer review form that you can fill out. Make sure to update the introductory paragraph with your team name and the names of the team members participating in the review. Then, answer each of the questions in the spaces provided underneath them. You’re expected to be thorough in your review, but this doesn’t necessarily require lengthy responses.

Remember, your goal is to help the team whose project proposal you’re reviewing. The team will not lose points because of issues you point out, as long as they address them before I review their proposals. You should be critical, but respectful in your review. Also remember that you will be evaluated on the quality of your review. So that’s an additional incentive to do a good job.

Reviewee tasks

Once you receive feedback from your peers, you should address them. You should do this by directly updating your proposal or making any other updates to your repo as needed. You can do these updates all in one commit or you can spread it across multiple commits. Regardless, in the last commit that addresses the peer review comments, you should use a keyword in your commit message that will close the peer review issues. These words are close, closes, closed, fix, fixes, fixed, resolve, resolves, and resolved and they need to be followed by the issue number (which you can find next to the issue title). So, your commit message can say something like “Finished updates based on peer review, fixes #1”.

Write-up

Your write-up should consist of three parts:

  1. Introduction (1-2 paragraphs): Brief introduction to the dataset. You may repeat some of the information about the dataset provided in the introduction to the dataset on the TidyTuesday repository, paraphrasing on your own terms. Imagine that your project is a standalone document and the grader has no prior knowledge of the dataset.

  2. Question 1: The title should relate to the question you’re answering.

    • Introduction (1-2 paragraphs): Introduction to the question and what parts of the dataset are necessary to answer the question. Also discuss why you’re interested in this question.

    • Approach (1-2 paragraphs): Describe what types of plots you are going to make to address your question. For each plot, provide a clear explanation as to why this plot (e.g. boxplot, barplot, histogram, etc.) is best for providing the information you are asking about. The two plots should be of different types, and at least one of the two plots needs to use either color mapping or facets.

    • Analysis (2-3 code blocks, 2 figures, text/code comments as needed): In this section, provide the code that generates your plots. Use scale functions to provide nice axis labels and guides. You are welcome to use theme functions to customize the appearance of your plot, but you are not required to do so. All plots must be made with ggplot2. Do not use base R or lattice plotting functions.

    • Discussion (1-3 paragraphs): In the Discussion section, interpret the results of your analysis. Identify any trends revealed (or not revealed) by the plots. Speculate about why the data looks the way it does.

  3. Question 2: Same structure outlined for Question 1, but for your new question. And the title should relate to the question you’re answering.

We encourage you to be concise. A paragraph should typically not be longer than 5 sentences.

You are not required to perform any statistical tests in this project, but you may do so if you find it helpful to answer your question.

Presentation + slides

Slides

In addition to the written report, your team will also create presentation slides and deliver presentation that summarize and showcase your project. Introduce your research question and data set, showcase visualizations, and discuss the primary conclusions. These slides should serve as a brief visual addition to your written report and will be graded for content and quality.

You can create your slides with any software you like (Keynote, PowerPoint, Google Slides, etc.). We recommend choosing an option that’s easy to collaborate with, e.g., Google Slides. If you choose this option, save the slides as PDF and upload it to your repo as presentation.pdf.

You can also use Quarto to make your slides! While we won’t be covering making slides with Quarto in the class, we would be happy to help you with it in office hours. It’s no different than writing other documents with Quarto, so the learning curve will not be steep!

The slide deck should be roughly 6 content slides + 1 title slide. Here is a suggested outline as you think through the slides; you do not have to use this exact format for the 6 slides.

  • Title Slide
  • Slide 1: Introduce the topic and motivation
  • Slide 2: Introduce the data
  • Slide 3: Main Question
  • Slide 4-5: Visualization 1 and 2
  • Slide 6: Conclusions + future work

Presentation

Presentations will take place in class during the last lab of the semester. The presentation must be no longer than 5 minutes. You will pre-record a video using one of the options below.

To pre-record your presentation, you may use can use any platform that works best for your group to record your presentation. Below are a few resources on recording videos:

Once your video is ready, upload the video to Panopto or another video platform (e.g., YouTube), then add a link to your video in your repo README.

To create your video with Panopto:

  • Go to https://arizona.hosted.panopto.com/ and sign in via your NetID
  • Click the “+” and select “Upload media”.
  • Drag and drop your recorded video.
  • Once you’ve uploaded the video to Panopto, click to share the video and copy the video’s URL. You will need this when you post the video in the discussion forum.

Grading Rubric:

Slides

Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.?

Presentation

  • Time management: Did the team divide the time well amongst themselves or got cut off going over time?
  • Professionalism: How well did the team present? Does the presentation appear to be well practiced? Did everyone get a chance to say something meaningful about the project?
  • Teamwork: Did the team present a unified story, or did it seem like independent pieces of work patched together?
  • Creativity and critical thought: Is the project carefully thought out? Does it appear that time and effort went into the planning and implementation of the project?
  • Content: Including, but not limited to the following:
    • Is the question well articulated in the presentation?
    • Can the question be answered with the data?
    • Does the analysis answer the question?
    • Are the conclusion(s) made based on the analysis justifiable?
    • Are the limitations carefully considered and articulated?

The reviewer / reviewee assignments for presentation peer-review are as follows:

reviewer reviewee_1 reviewee_2
lestrada cwjohn13 ultraviolets369
jrflemi5 ultraviolets369 CristinaUAMS
rahathelal CristinaUAMS manasppanse
josholmes manasppanse derekriceX
VeraJSolo derekriceX iriscys
cwjohn13 iriscys dwiesneraz
ultraviolets369 dwiesneraz Data Duo
CristinaUAMS Data Duo lestrada
manasppanse lestrada jrflemi5
derekriceX jrflemi5 rahathelal
iriscys rahathelal josholmes
dwiesneraz josholmes VeraJSolo
Data Duo VeraJSolo cwjohn13

Also find the rubric here.

Website

Each of your projects will have a website that looks like https://info-526-su24.github.io/project-01/. You are not expected to change the styling of the website, but if you want to, you’ll need to edit the _quarto.yml file in your repo. Feel free to Google your way around it or ask on the discussion forum / office hours!

Repo organization

The following folders and files in your project repository:

  • /data/*: Your dataset

    • /data/*.csv: Your dataset in CSV format

    • /data/README.md: Metadata about your dataset including information on provenance, codebook, etc.1

  • index.qmd: Your project write-up

  • proposal.qmd: Your project proposal

  • presentation.qmd: Your project presentation

  • _quarto.yml: Setup file for project website

Grading

Total 100 pts
Proposal2 10 pts
Presentation3

30 pts

(25 pts from teaching team, 5 pts from audience)

Write-up4 30 pts
Reproducibility, style, and organization5 10 pts
Within team peer evaluation6 15 pts
Between team peer evaluation7 5 pts

Some of the components are further detailed below.

Proposal (10 points)

  • Data - Dataset is in the data folder, along with a codebook in the README of that folder. (3 points)
  • Write-up - All required components included. (5 points)
  • Workflow - Peer review issues closed via commits. (1 point)
  • Teamwork - All team members must contribute to the repo via commits. (1 point)

Presentation (30 points)

Teaching team (25 points)

  • Time management: Did the team divide the time well amongst themselves or got cut off going over time? (2 points)

  • Professionalism: How well did the team present? Does the presentation appear to be well practiced? Did everyone get a chance to say something meaningful about the project? (2 points)

  • Teamwork: Did the team present a unified story, or did it seem like independent pieces of work patched together? (3 points)

  • Slides: Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.? (3 points)

  • Creativity Critical Thought: Is the project carefully thought out? Does it appear that time and effort went into the planning and implementation of the project? (3 points)

  • Content: Both Question 1 and Question 2 will each be scored by the following criteria. Point values apply per part.

    • Is the question well articulated in the presentation? (1 point)
    • Can the question be answered with the data? (1 point)
    • Do(es) the data visualization(s) answer the question? (1 point)
    • Do(es) the data visualization(s) follow good visualization practices? (1 point)
    • Is/are the conclusion(s) made based on the visualization(s) justifiable? (1 point)
    • Are the limitations carefully considered and articulated? (1 point)

Peers (5 points)

  • Content: Is the research question well designed and is the data being used relevant to the research question? (1 point)
  • Content: Did the team use appropriate visualizations and did they interpret them accurately? (1 point)
  • Creativity and Critical Thought: Is the project carefully thought out? Are the limitations carefully considered? Does it appear that time and effort went into the planning and implementation of the project? (1 point)
  • Slides: Are the slides well organized, readable, not full of text, featuring figures with legible labels, legends, etc.? (1 point)
  • Professionalism: How well did the team present? Does the presentation appear to be well practiced? Are they reading off of a script? Did everyone get a chance to say something meaningful about the project? (1 point)

Write-up (30 points)

  • Introduction: The introduction provides a clear explanation of the question and the dataset used to answer the question, including a description of all relevant variables in the dataset. (2 points)

Both Question 1 and Question 2 will each be scored by the following criteria. Point values apply per part.

  • Justification of approach: The chosen analysis approach and visualizations are clearly explained and justified. (3 points)
  • Code: Code is correct, easy to read, properly formatted, and properly documented. (3 points)
  • Visualization: The visualizations are appropriate, easy to read, and properly labeled. (4 points)
  • Discussion: Discussion of results is clear and correct, and it has some depth without begin excessively long. (4 points)

Reproducibility, style, and organization (10 points)

  • All required files are provided. Quarto files render without issues and reproduce the necessary outputs. (3 points)
  • Data is in the data folder, with a codebook in the README, and is loaded from this folder in presentation and writeup. (3 points)
  • Documents are well structured and easy to follow. No extraneous materials. (2 points)
  • All issues are closed, mostly with specific commits addressing them. (2 points)

Guidelines

Please use the project repository that has been created for your team to complete your project. This means putting all of the content in the Quarto files provided, rendering them to obtain the output, and committing and pushing all files to your repository by the indicated deadlines. Your Quarto files (.qmd) and the resulting html files (.html) will be graded jointly, so they must be consistent (as in, don’t change the Quarto file without also updating the rendered document!).

All results presented must have corresponding code. If you do calculations by hand instead of using R and then report the results from the calculations, you will not receive credit for those calculations. Any answers/results given without the corresponding R code that generated the result will not be considered. For example, if you’re reporting the number of observations in your dataset, don’t just write the number manually, use inline R code to calculate the number. All code reported in your final project document should work properly. Please do not include any extraneous code or code which produces error messages. Code which produces certain warnings and messages is acceptable, as long as you understand what the warnings mean. In such cases you can add warning: false and message: false in the relevant R chunks. Warnings that signal lifecycle changes (e.g., a function is deprecated and there’s a newer/better function out there) should generally be addressed by updating your code, not just by hiding the warning.

Tips

  • You’re working in the same repo as your teammates now, so merge conflicts will happen, issues will arise, and that’s fine! Commit and push often, and ask questions when stuck.

  • Review the marking guidelines below and ask questions if any of the expectations are unclear.

  • Make sure each team member is contributing, both in terms of quality and quantity of contribution (we will be reviewing commits from different team members).

  • Set aside time to work together and apart (physically).

  • Code:

    • In your write-up and your presentation your code should be hidden (echo: false) so that your slides are neat and easy to read and your write-up is about your narrative and results only. However your document should include all your code such that if I re-render your Quarto file I should be able to obtain the results you presented. Exception: If you want to highlight something specific about a piece of code, you’re welcomed to show that portion.

    • Even though your code is hidden, it should still confirm to good coding practice, style, etc. You will be evaluated on code quality as well, which we will check by explicitly reviewing your Quarto documents (not just the rendered output).

  • Teamwork: You are to complete the assignment as a team. All team members are expected to contribute equally to the completion of this assignment and team evaluations will be given at its completion - anyone judged to not have sufficiently contributed to the final product will have their grade penalized. While different team members may have different backgrounds and abilities, it is the responsibility of every team member to understand how and why all code and approaches in the assignment work.

  • When you’re done, review the documents on GitHub to make sure you’re happy with the final state of your work. Then go get some rest!

Footnotes

  1. It is ok for you to repeat some information from the TidyTuesday repository but, but make sure appropriately attribute it here.↩︎

  2. Your proposal score will also take into consideration how much you’ve improved your proposal based on peer feedback.↩︎

  3. A more detailed grading rubric for this component to be added.↩︎

  4. A more detailed grading rubric for this component to be added.↩︎

  5. Style and format does count for this assignment, so please take the time to make sure everything looks good and your data and code are properly formatted.↩︎

  6. This relates to your contribution to the teamwork and how your team members evaluate this contribution. Scores for each teammate will be different for this component of the project grade.↩︎

  7. This relates to the quality and quantity of the peer review you provide for other teams.↩︎