Data Pipeline
K
Written by Kevin Wesley
Updated over a week ago

Only Org Admins can download this data via the AWS CLI.

To get the data:

  • Generate a new api token for s3_labs

  • Make an api call with:

curl --location --request GET 'https://api.vocareum.com/api/v2/orgs/{{orgid}}/s3_labs' --header 'Authorization: Token {{token}}'

This returns temporary credentials for AWS S3 access for one hour.

  • Run the following commands in your terminal:

export AWS_ACCESS_KEY_ID=ASIAIOSFODNN7EXAMPLE

export AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

export AWS_SESSION_TOKEN=AQoDYXdzEJr...

aws s3 sync s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/ .

(the values of AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY , AWS_SESSION_TOKEN are given in the response from the previous api call)

Directory Structure

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/lab_sessions/lab_sessions_y-m-d.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/cloud_spend/aws_cost_y-m-d.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/cloud_spend/azure/azure_cost_y-m-d.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/service_spend/service_spend_y-m.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/grades/course{{courseid}}/grades.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/course-asn_mapping/{{course/asn}}_mapping.csv

s3://vocareum-us-west-2-lab-analytics/org{{orgid}}/enrollment/course{{courseid}}/enrollment.csv

Lab Sessions, AWS Cost, AWS Service Spend, and Azure Cost

The service that generates these files runs every 6 hours and grabs all of the previous day's data (UTC time). Lab Sessions, AWS Cost, and Azure Cost are daily files, and Service Cost is a monthly file.

lab_sessions

Records all lab types (VM, Container, Cloud).

"User ID", "User Email", "Course ID", "Course Name", "Assignment ID", "Assignment Name", "Part ID", "Part Name", "Lab Type", "Start Time", "End Time", "Duration (min)", Log, Tags, "Launch Time", "Init Time", "Launch To Ready Duration (mins)", "Concurrent Labs Count", "Disconnects", "Latency Time (ms)"

  • Launch Time: when "Start Lab" is clicked

  • Start Time: when EC2 instance is started

  • Init Time: when lab is actually ready to use

  • Launch To Ready Duration (mins): time it takes from launch time to init time

  • Concurrent Labs Count: number of labs already started when user starts lab

  • Disconnects: number of times user gets disconnected

ex:

1731xxx, user@vocareum.com, 45686, "Kevin 2021", 760081, lab-04-api, 760084, assignment-part, Cloud,"Feb-23-2022 9:40:20 pm UTC", "Feb-23-2022 11:40:22 pm UTC", 120.03333333333," START successful; CFN creation successful; END successful (Session Timer Expires)",,"Jun-01-2022 9:14:57 pm UTC", "Jun-01-2022 9:15:02 pm UTC", 0.083333333333333, 180

aws_cost

Spend accumulated per user.

"AWS Account ID", "User ID", "User Email", "Part ID", "Part Name", Cost, "File Updated Time"

ex:

8622xxxx, 1731xxxx, user@vocareum.com, 760084, assignment-part, 0.038598, 1645823566

azure_cost

Spend accumulated per user.

"User ID","User Email","Part ID","Part Name",Month,Year,Monthly-cost-to-date

ex:

8622xxxx, user@vocareum.com, 760084, assignment-part, 10, 2022, 0.07

service_spend

Cost accumulated for a particular AWS service per account.

"AWS Account ID", "User Id", "User Email", "Course ID", "Part ID", Service, Cost

ex:

862252xxx, 173xxxx, user@vocareum.com, 45686,760084, "Amazon Elastic Compute Cloud", 0.037481

Assignment Mapping, Course Mapping, Grades, and Enrollment

The service that generates mapping, grades, and enrollment data runs every 24 hours

asn_mapping

List of assignments in a course and copies of those assignments.

org_id, course_id, assignment_id, assignment_name, part_id, part_name, parent_assignment_id, parent_part_id

ex:

9xxx, 1020, 190155, "AWS Sandbox", 190156, Lab, 50833, 50834

course_mapping

List of courses in the org and clones of those courses.

org_id, org_name, course_id, course_name, create_method, parent_course_id

ex:

9xxx, "Example Org", 4051, "Example CS 17 - 1", "Clone by Reference", 10x

grades

courseid, coursename, asnid, asnname, ispeerreviewon, maxscore, userid, partnerid, asnscore, isoverride, partid, partname, partscore, rubricid, rubricname, rubricscore, graderid

ex:

20154, "Test Course", 412074, "Makeup HW 1 (grading script)", 0, 14, 9260xx, 10, 0, 412075, "Makeup HW 1", 10, 551636, Score, 10, 926xx

enrollment

userid, courseid, role, name, email, client_key, client_userid, dropped, organization_terms_agreed, organization_terms_agreed_date_utc

ex:

1731xxx, 20154, student, user@vocareum.com, demokey, 9999999, 0,,

Data Terms

lab_sessions

user_id: Vocareum id of user
user_email: email of user on Vocareum
course_id: Vocareum id of course
course_name: name of course
assignment_id: Vocareum id of assignment
assignment_name: name of assignment
part_id: Vocareum id of part
part_name: name of part
lab_type: one of these options: 'Cloud', 'Container', 'Databricks', 'VM'
end_time: time lab was ended
duration_min: length of lab (end_time - start_time)
log: event messages (e.g. LAUNCH initiated, START successful)
latency_time_ms: estimated measure of user's internet speed

note - "tags" in lab_session is only used internally


aws_cost


aws_account_id: AWS id of account assigned to user
user_id: Vocareum id of user
user_email: email of user on Vocareum
part_id: Vocareum id of part
part_name: name of part
cost: amount of dollars spent for this AWS account
file_updated_time: when the file was updated
deactivation_date: when the AWS account was deactivated


grades

coursename: name of Vocareum course
asnid: Vocareum id of assignment
asnname: name of assignment
ispeerreviewon: 1/true if the assignment uses peer review grading
maxscore: highest score a user can receive
userid: Vocareum id of user
partnerid: Vocareum id of partner (if it is a team assignment)
asnscore: score that the user received for the assignment
isoverride: 1/true if the user's grade was overridden by a teacher
partid: Vocareum id of part
partname: name of part
partscore: score that the user received for the part
rubricid: Vocareum id of rubric (a.k.a. grading criterion)
rubricname: name of rubric
rubricscore: score that the user received for that rubric
graderid: Vocareum id of user who graded the assignment
submission: number indicating which submission the row is for (e.g. submission = 2 means user's second submission)
submission_time: when the assignment was submitted

service spend


user_id: Vocareum id of user
user_email: email of user on Vocareum
course_id: Vocareum id of course
part_id: Vocareum id of part
service: name of AWS service
cost: amount of dollars spent for this AWS service

Did this answer your question?