Data Pipeline
Written by Kevin Wesley
Updated over a week ago

Only Org Admins can download this data via the AWS CLI.

To get the data:

  • Generate a new api token for s3_labs

  • Make an api call with:

curl --location --request GET '{{orgid}}/s3_labs' --header 'Authorization: Token {{token}}'

This returns temporary credentials for AWS S3 access for one hour.

  • Run the following commands in your terminal:




aws s3 sync s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/ .

(the values of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN are given in the response from the previous api call)

Directory Structure








Lab Sessions, AWS Cost, AWS Service Spend, and Azure Cost

The service that generates these files runs every 6 hours and grabs all of the previous day's data (UTC time). Lab Sessions, AWS Cost, and Azure Cost are daily files, and Service Cost is a monthly file.


Records all lab types (VM, Container, Cloud).

"User ID", "User Email", "Course ID", "Course Name", "Assignment ID", "Assignment Name", "Part ID", "Part Name", "Lab Type", "Start Time", "End Time", "Duration (min)", Log, Tags, "Launch Time", "Init Time", "Launch To Ready Duration (mins)", "Concurrent Labs Count", "Disconnects", "Latency Time (ms)"

  • Launch Time: when "Start Lab" is clicked

  • Start Time: when EC2 instance is started

  • Init Time: when lab is actually ready to use

  • Launch To Ready Duration (mins): time it takes from launch time to init time

  • Concurrent Labs Count: number of labs already started when user starts lab

  • Disconnects: number of times user gets disconnected


1731xxx,, 45686, "Kevin 2021", 760081, lab-04-api, 760084, assignment-part, Cloud,"Feb-23-2022 9:40:20 pm UTC", "Feb-23-2022 11:40:22 pm UTC", 120.03333333333," START successful; CFN creation successful; END successful (Session Timer Expires)",,"Jun-01-2022 9:14:57 pm UTC", "Jun-01-2022 9:15:02 pm UTC", 0.083333333333333, 180


Spend accumulated per user.

"AWS Account ID", "User ID", "User Email", "Part ID", "Part Name", Cost, "File Updated Time"


8622xxxx, 1731xxxx,, 760084, assignment-part, 0.038598, 1645823566


Spend accumulated per user.

"User ID","User Email","Part ID","Part Name",Month,Year,Monthly-cost-to-date


8622xxxx,, 760084, assignment-part, 10, 2022, 0.07


Cost accumulated for a particular AWS service per account.

"AWS Account ID", "User Id", "User Email", "Course ID", "Part ID", Service, Cost


862252xxx, 173xxxx,, 45686,760084, "Amazon Elastic Compute Cloud", 0.037481

Assignment Mapping, Course Mapping, Grades, and Enrollment

The service that generates mapping, grades, and enrollment data runs every 24 hours


List of assignments in a course and copies of those assignments.

org_id, course_id, assignment_id, assignment_name, part_id, part_name, parent_assignment_id, parent_part_id


9xxx, 1020, 190155, "AWS Sandbox", 190156, Lab, 50833, 50834


List of courses in the org and clones of those courses.

org_id, org_name, course_id, course_name, create_method, parent_course_id


9xxx, "Example Org", 4051, "Example CS 17 - 1", "Clone by Reference", 10x


courseid, coursename, asnid, asnname, ispeerreviewon, maxscore, userid, partnerid, asnscore, isoverride, partid, partname, partscore, rubricid, rubricname, rubricscore, graderid


20154, "Test Course", 412074, "Makeup HW 1 (grading script)", 0, 14, 9260xx, 10, 0, 412075, "Makeup HW 1", 10, 551636, Score, 10, 926xx


userid, courseid, role, name, email, client_key, client_userid, dropped, organization_terms_agreed, organization_terms_agreed_date_utc


1731xxx, 20154, student,, demokey, 9999999, 0,,

Data Terms


user_id: Vocareum id of user
user_email: email of user on Vocareum
course_id: Vocareum id of course
course_name: name of course
assignment_id: Vocareum id of assignment
assignment_name: name of assignment
part_id: Vocareum id of part
part_name: name of part
lab_type: one of these options: 'Cloud', 'Container', 'Databricks', 'VM'
end_time: time lab was ended
duration_min: length of lab (end_time - start_time)
log: event messages (e.g. LAUNCH initiated, START successful)
latency_time_ms: estimated measure of user's internet speed

note - "tags" in lab_session is only used internally


aws_account_id: AWS id of account assigned to user
user_id: Vocareum id of user
user_email: email of user on Vocareum
part_id: Vocareum id of part
part_name: name of part
cost: amount of dollars spent for this AWS account
file_updated_time: when the file was updated
deactivation_date: when the AWS account was deactivated


coursename: name of Vocareum course
asnid: Vocareum id of assignment
asnname: name of assignment
ispeerreviewon: 1/true if the assignment uses peer review grading
maxscore: highest score a user can receive
userid: Vocareum id of user
partnerid: Vocareum id of partner (if it is a team assignment)
asnscore: score that the user received for the assignment
isoverride: 1/true if the user's grade was overridden by a teacher
partid: Vocareum id of part
partname: name of part
partscore: score that the user received for the part
rubricid: Vocareum id of rubric (a.k.a. grading criterion)
rubricname: name of rubric
rubricscore: score that the user received for that rubric
graderid: Vocareum id of user who graded the assignment
submission: number indicating which submission the row is for (e.g. submission = 2 means user's second submission)
submission_time: when the assignment was submitted

service spend

user_id: Vocareum id of user
user_email: email of user on Vocareum
course_id: Vocareum id of course
part_id: Vocareum id of part
service: name of AWS service
cost: amount of dollars spent for this AWS service

Did this answer your question?