VADSTI21

Virtual Applied Data Science Training Institute (VADSTI)

Data Science Approaches to Better Understand Clinical and Genomic Informatics
February 11 – April 30, 2021 | An 8-week data science training series in a virtual setting

About VADSTI

With the recent advancements in technology, and computational tools, healthcare services, and clinical and geonomic sciences can store large amounts of datasets. There is therefore increased demand for researchers to utilize data analytics capabilities to look at recent trends, predict outcomes and to make better clinical and health policy decision. Skill sets in data science are critical for advancing the science of minority health and health disparities. The Howard University Research Centers in Minority Institutions, RCMI, Program with funding from NIH created the VADSTI to meet the growing data science demand and their application to problems of minority health and health disparities.

The mission of VADSTI is to advance education and research by providing training in the foundations of programming and the critical data analytic skills for planning and conducting research that involves big data. Our aim is to attract and engage underrepresented students and researchers in data science application to biomedical, clinical and genomic research, with a focus on diseases common to minority populations. VADSTI draws faculty with complementary experts in the conduct and application of data science from across different institutions and in partnership with the NIH Office of Data Science Strategy to launch an 8-week comprehensive training in a virtual environment. VADSTI 2021 is an 8-Week training series to be run every other week.

Program Objectives & Competencies

The primary objective of the 2021 VADSTI program is to provide training in the foundations of data science and advance analytic skills and introduce tools for clinical and genomic research. Over the course of the 8-week training program you will:

  • Be introduced to the principles of data science.
  • Gain practical, hands-on experience with Python and related libraries for accessing data from multiple sources and use analytic methods for analyses.
  • Learn about the underlying concepts of probability and statistics.
  • Be introduced to advanced statistical analytic techniques utilized in biomedical, clinical and genomic research.
  • Understand the concepts of data partitioning and and practice behind supervised and unsupervised learning.
  • Be introduced to advanced algorithmic techniques including machine learning and deep learning.
  • Be introduced to tools for applied data science using cloud-based platforms for clinical and genomic research.
  • Learn from experts on current research topics in biomedical, clinical, and genomic application.

Certificate of Completion: Participants who complete 6 of the 8 modules will receive a verified Certificate of Completion from VADSTI.

Evaluation: At the end of each training module, you will be requested to complete electronic feedback forms on the extent to which expectations / objectives were met.

Registration & Fees: No fees for participation, but registration is required to attend.

VADSTI Training Program Schedule

No prerequisite for research knowledge topics.
Basic undergraduate knowledge of algebra and probability recommended for content knowledge topics.

The training series consists of 8 modules.
The complete schedule and module descriptions are detailed below.

Module 1
Foundations of 
Data Science with Python


Thursday, February 11, 2021 & 
Friday, February 12, 2021
11:00 AM – 1:30 PM EST

Module 2
Data Exploration & 
Visualization


Thursday, February 18, 2021 & 
Friday, February 19, 2021
11:00 AM – 1:30 PM EST

Module 3
Statistical Distributions, Sampling, & Hypotheses Testing

Thursday, March 4, 2021 & 
Friday, March 5, 2021
11:00 AM – 1:30 PM EST

Module 4 
Cluster Analysis

Thursday, March 18, 2021 & 
Friday, March 19, 2021
11:00 AM – 1:30 PM EDT

Module 5
Neural Networks Models

Thursday, April 1, 2021 & 
Friday, April 2, 2021
11:00 AM – 1:30 PM EDT

Module 6
Classification Models

Thursday, April 15, 2021 &
Friday, April 16, 2021
11:00 AM & 1:30 PM EDT

Module 7 
Tools for Applied Data Science Using Cloud-Based Platform


Thursday, April 22, 2021 &
Friday, April 23, 2021
11:00 AM – 1:30 PM EDT

Module 8 
Current Research Topics Seminar: Biomedical, Clinical & Genomic Application

8.1 – Thursday, March 11, 2021
8.2 – Thursday, March 25, 2021
8.3 – Thursday, April 8, 2021
8.4 – Thursday, April 29, 2021 

VADSTI Training Program Curriculum

 Here are details for each of the modules

Thursday, February 11, 2021 & Friday, February 12, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR – Prem Saggar, MS

This module will introduce you to the core principles of data science and python programming and associated libraries. You will be introduced to and learn how to use Jupyter notebooks. You will understand what data science and AI can currently do. An overview of the state-of-the-art methods will be introduced and real-life examples from clinical and healthcare data will be used for illustration.

Thursday, February 18, 2021 & Friday, February 19, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR – Anas Belouali, MS, MEng

This module provides recipes for exploratory data analysis and data visualization which are critical steps in any data science project. The goal of this module is to learn how to visualize and perform initial investigations of the data so as to discover patterns, spot anomalies, test hypothesis, and check assumptions iwht the help of summary statistics and graphical representations. We will be using python to explore, filter, and manipulate the UCI diabetes; identify data anomalies and missingness; learn how to impute missing data; identify highly correlated variables.

Explore the Johns Hopkins University COVID-19 data repository and import the data and wrangle the data to look at the number of reported confirmed cases by country and regions; plot the number of reported confirmed cases and deaths by country. In addition, we will use the COVID-19 tracking project dataset to explore racial disparities in COVID-19 mortality and infections in the US.

Thursday, March 4, 2021 & Friday, March 5, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR: Nawar Shara, PhD

This module will introduce you to basic probability and statistical concepts. You will learn about different descriptive and inferential statistical techniques that are utilized in data science. You will learn about commonly used statistical distributions functions including Binomial, Poisson, and Normal distributions. You will understand and learn about appropriate the way to formulate hypotheses statements and select appropriate statistical techniques for testing.

Thursday, March 18, 2021 & Friday, March 19, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR: Martin Skarzynski, OhD

Module will discuss Data Description and Clustering. Similarity measures and dimensionality reduction will be discussed. Learn about k-means algorithm, and hierachial clustering. Unsupervised learning with clinical and/or genomic dataset will be used for illustration.

Thursday, April 1, 2021 & Friday, April 2, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR: Gloria Washington, PhD

The module will introduce you to machine learning and neural networks. Supervised learning, linear models for regression, and basic neural network structure will be discussed. The expected competencies are to (i) understand the motivation and functioning of the most common types of deep neural networks, (ii) understand the choices and limitations of a model for a given setting, (iii) apply deep learning techniques to practical problems, and (iv) critically evaluate model performance and interpret results. You will learn to design neural network architectures and training procedures through hands-on activities.

Thursday, April 15, 2021 & Friday, April 16, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR: Prem Saggar, MS

You will understand Classification methods, and how they differ from rules-based systems. Learn about discriminant analysis, and the k-nearest neighbor classifier. Working examples are illustrated with Classification Trees and Neural Network models.

Thursday, April 22, 2021 – Friday, April 23, 2021
11:00 AM – 1:30 PM EST

INSTRUCTOR: NHGRI AnVIL Team
 
The AnVIL is a cloud-based platform that supports the management, analysis and sharing of biomedical data for the NHGRI research community. It aims to advance our basic understanding of the genetic basis of complex traits and accelerate discovery and development of therapies, diagnostic tests, and other technologies for diseases like cardiovascular disease or autism spectrum disorders. The platform currently hosts more than 75,000 whole human genome data sets, and offers a variety of analysis capabilities including: Terra for large scale batch computing and interactive computing; Gen3 for managing, analyzing, harmonizing, and sharing large datasets; DockStore for sharing Docker-based analysis workflows; Jupyter notebooks for organizing live code, equations, visualizations and narrative text into a single document; R Studio for interactive machine learning, statistical computing, and visualizations; Bioconductor for community-driven interactive genomics with R; and Galaxy, for accessible, reproducible, and transparent genomic science. In this module, you will be introduced to the platform, tools and functionality for data science projects.
 

8.1 Data Science Applications to Covid-19 & Data Mining in Higher Education | Thursday, March 11, 2021
11:00 AM – 1:00 PM EST

INSTRUCTOR: Prem Saggar, MS

8.2 What You Can Learn About Human Gene Expression When Making 70,000 RNA-seq Samples Easy to Use | Thursday, March 25, 2021
11:00 AM – 1:00 PM EDT

INSTRUCTOR: Jeff Leek, PhD

8.3 Data Science & Ethical Consideration | Thursday, April 8. 2021
11:00 AM – 1:00 PM EDT

INSTRUCTOR: Rochelle Tractenberg, PhD

8.4 Finally Finishing the Human Genome | Thursday, April 29, 2021
11:00 AM – 1:00 PM EDT

INSTRUCTOR: Adam Phillipy, PhD