We will be using GitHub classrooms to manage the mechanics of assignment submission. Start by sorting through these details. As a one-time action, we will need to explicitly link your GitHub account to your UWaterloo directory id. That is, we need to know that it is you submitting the assignment. You can use whatever GitHub account you want — equally okay to use your personal account or a course-specific account. We just need to verify this association.
‼️ Do not wait until the last minute to go through these steps, as there is a blocker that requires attention from the instructor/TAs.
Start by following this invitation link to assigment 1.
Next, on Piazza, send the instructors a private message, as follows:
Subject: UWATERLOO_USERID
: GitHub account ACCOUNT
I am YOUR NAME
UWATERLOO_USERID
and I certify that my GitHub account is ACCOUNT
.
Bind the variables as appropriate. So, for example:
Subject: n85ahmed: GitHub account ahmednafis00
I am Nafis Ahmed n85ahmed and I certify that my GitHub account is ahmednafis00.
With this message, you are explicitly confirming your identity and the association between you and your GitHub account.
After receiving this message on Piazza, we will add you to the "students" team of the organization cs451-2025f-lin
.
You should then recieve an invitation, viewable at this link.
Once you accept the invitation, you should now be able to view the actual assignment.
If you're having issues joining the organization cs451-2025f-lin
, go to https://github.com/cs451-2025f-lin
.
There should be invitation banner at the top that you can access.
Your own fork of the assignment should be here at https://github.com/cs451-2025f-lin/assignment-1-ACCOUNT
, where ACCOUNT
is your GitHub account.
You might need to visit https://github.com/cs451-2025f-lin/assignment-1-ACCOUNT/invitations
first to accept the invitation.
After doing this, you should now be the admin of your own repo.
Next, you'll need to limit access by others to your assignment repo.
Go to https://github.com/cs451-2025f-lin/assignment-1-ACCOUNT/settings/access.
By default, all members of the team "students" have access to your repo.
This should not be the case — please remove that by clicking on the little trash icon.
To confirm that everything is in order, check that you can make changes to your own repo. (For example, clone locally, make a small edit, and then push back to origin.)
If you've gotten here, you're done with the "mechanics"... Congrats! 🎉
The Jupyter notebook that contains the actual assignment is available here.
The data file a1-brand.csv
needed for this assignment is available here.
Before you start the assignment, make sure you've already installed the software needed for the course: instructions are provided here. You'll be ready to start the assignment once you are able to launch JupyterLab and view the assignment notebook in a browser.
For this assignment, feel free to use generative AI to help you. Yes, you read that correctly. Go ahead and use ChatGPT, Claude, Gemini, etc. The primary point of this assignment is to make sure that you've correctly installed and configured the software needed for the rest of the course. That's great if generative AI tools help you accomplish this. Furthermore, examples of word count using PySpark abound on the web, so using generative AI isn't a big deal. However, you would be well advised to use generative AI as a tool to help you learn the material, as opposed to a crutch that helps you speed run through the assignment without actually learning anything.
Here's what you'll turn in:
assignment1.ipynb
: a copy of the assignment Jupyter notebook, but with code filled in.top10_words.csv
: CSV of top-10 most frequently occurring words (including stopwords).top10_noStopWords.csv
: CSV of top-10 most frequently occurring words (discarding stopwords).assignment1_genai.md
: a markdown file explaining how you used generative AI for this assignment. If you did not use generative AI at all, say so in the file.Specifically, make sure your assignment repo has the above files.
top10_words.csv
: 7 pointstop10_noStopWords.csv
: 5 points