What is DS @ OSU?

Data Science @ OSU (DS @ OSU) places powerful computational environments and resources at the fingertips of students and instructors. DS @ OSU is designed to support interactive data science, and provides over 40 programming languages, including Python, R, and Julia, in a streamlined and highly accessible cloud environment. DS @ OSU supports integrated explanations and coding, making data sciences understandable, repeatable, and shareable. 

Integrated with Canvas, DS @ OSU provides seamless access for students to not only get assignments and learning materials, but also direct access to their own personal learning and computing environments, all within a web browser. The integration with Canvas provides faculty and student TA's straightforward access to review and provide feedback on student assignments. 

DS @ OSU Benefits:

Students learn and code in cutting-edge computing environments through a web browser, providing access to powerful data science services and resources available with nothing to install. DS @ OSU supports students in following instruction, writing, and testing their own code at their own pace and environment. 

  • Open access to powerful computing environments
  • Cloud-based software accessed from your browser
  • Integrated with Canvas for seamless learning 
  • Reduced reliance on classroom computers and labs to complete coursework

Powered by Digital Research & Infrastructure

DS @ OSU is provided by Digital Research & Infrastructure (DRI). DRI supports research and classroom instructional computing at Oregon State University, providing easy access to advanced high-performance computing (HPC) cluster, performance storage, data sharing, server housing space for research collocation and consultation services for remote cloud computing.  In addition, DRI can provide training on the use of both on-site HPC resources as well large national HPC centers.

 

FAQs

 

Contact DRI to get started with DS @ OSU as an instructor. Students will access DS @ OSU through Canvas.

A Canvas course is available to support instructors using DS @ OSU. Contact DRI to be added to the Canvas course and with any questions. 

DS @ OSU currently offers the following features:

  • Based on the datascience-noteook Jupyter Docker Stack, we support:

    • JupyterLab, the latest-gen Jupyter interface
    • Tools and Interfaces:
      • Jupyter notebooks & Python 3
      • R, RStudio, and R Shiny
      • Julia and bash (command-line)
    • A wide array of pre-installed Python and R packages
  • For each Hub (generally we setup one Hub per class), a shared storage space with "classroom" permissions:

    • Students can read+write in their own home directories
    • Instructors (or other admins such as TAs) can directly browse and edit student data
    • A hub_data_share for instructor staging of data and code
  • All users can install scripts and R and Python packages for their own use

  • Instructors can install scripts and R and Python packages for everyone

  • Additional hooks for instructors to customize user environments

  • Automatic login via Canvas, including support for social logins from Canvas Studio Sites

    • as Links from Assignments and Modules
    • Specific Canvas Roles (e.g. Instructors and/or TAs) determine Admin-level access within a Hub

As much as possible, we've designed DS @ OSU on the principle of making the easy things easy, and the hard things possible. Simultaneously, we've built a system that scales up and down, able to support dozens to hundreds (potentially thousands) of students. Lastly, a number of features and backend support systems are still in development and testing. 

Please be especially aware of these items as a user of DS @ OSU:

  • Autoscaling Wait Times: Hub access can at times be delayed for several minutes while cloud resources are created on the fly to support them. 
    • This can often be avoided with planning. We ask that Instructors have a working understanding of computing resources to help manage their class (described in the "Cloud Server Management" sections).
       
  • Getting Access: Processes for Hub creation and decommissioning are a work in progress.
    • Currently, this is handled by a request to DRI. 
       
  • Canvas + Hubs: 'Connecting' a Hub to a Canvas course requires some setup steps within the Canvas course settings.
    • This can be done by the instructor, or by adding a support person as a "Designer" who can make the connection (and edit course content, but cannot see FERPA-protected information such as grades and discussions). 
       
  • Computation Limits: New! Previously users were limited to 2G RAM and 2 CPU cores, but we are open-testing a time-based quota system for cost-effective use of larger compute resources in the cloud, including GPU compute (with TensorFlow 2 installed).
  • Data Storage Limits: Data storage is designed as a single shared space for all users within a single Hub.
    • We default to this space being 40 gigabytes but can allocate larger spaces (up to 500G) on request during Hub creation. 
       
  • Data Storage Sharing: There are currently no per-user storage limits within the shared space (this is a difficult challenge we're working on). Running out of space will prevent the creation of new files.
    • Deleting data frees the space again - we are working on tools for Instructors and TAs to identify "data hogs".
       
  • Data Retention: Long-term data storage and management is not supported at this time.
    • We can create a Hub up to a month prior to a course for instructors to stage data and packages, but require all data to be removed within 2 weeks after the end of a course. 
       
  • Backups and Recovery: 
    • At this time restoring data on hub instances is not available.  We're working to enable backups in the near future.