Virtual Applied Data Science Training Institute (VADSTI)
Data Science Approaches to Better Understand Clinical and Genomic InformaticsAbout VADSTI
With the recent advancements in technology, and computational tools, healthcare services, and clinical and geonomic sciences can store large amounts of datasets. There is therefore increased demand for researchers to utilize data analytics capabilities to look at recent trends, predict outcomes and to make better clinical and health policy decision. Skill sets in data science are critical for advancing the science of minority health and health disparities. The Howard University Research Centers in Minority Institutions, RCMI, Program with funding from NIH created the VADSTI to meet the growing data science demand and their application to problems of minority health and health disparities.
The mission of VADSTI is to advance education and research by providing training in the foundations of programming and the critical data analytic skills for planning and conducting research that involves big data. Our aim is to attract and engage underrepresented students and researchers in data science application to biomedical, clinical and genomic research, with a focus on diseases common to minority populations. VADSTI draws faculty with complementary experts in the conduct and application of data science from across different institutions and in partnership with the NIH Office of Data Science Strategy to launch an 8-week comprehensive training in a virtual environment. VADSTI 2021 is an 8-Week training series to be run every other week.
Program Objectives & Competencies
The primary objective of the 2021 VADSTI program is to provide training in the foundations of data science and advance analytic skills and introduce tools for clinical and genomic research. Over the course of the 8-week training program you will:
- Be introduced to the principles of data science.
- Gain practical, hands-on experience with Python and related libraries for accessing data from multiple sources and use analytic methods for analyses.
- Learn about the underlying concepts of probability and statistics.
- Be introduced to advanced statistical analytic techniques utilized in biomedical, clinical and genomic research.
- Understand the concepts of data partitioning and and practice behind supervised and unsupervised learning.
- Be introduced to advanced algorithmic techniques including machine learning and deep learning.
- Be introduced to tools for applied data science using cloud-based platforms for clinical and genomic research.
- Learn from experts on current research topics in biomedical, clinical, and genomic application.
Certificate of Completion: Participants who complete 6 of the 8 modules will receive a verified Certificate of Completion from VADSTI.
Evaluation: At the end of each training module, you will be requested to complete electronic feedback forms on the extent to which expectations / objectives were met.
Registration & Fees: No fees for participation, but registration is required to attend.
VADSTI Training Program Schedule
No prerequisite for research knowledge topics.
Basic undergraduate knowledge of algebra and probability recommended for content knowledge topics.
The training series consists of 8 modules.
The complete schedule and module descriptions are detailed below.
Module 1
Foundations of
Data Science with Python
Thursday, February 11, 2021 &
Friday, February 12, 2021
11:00 AM – 1:30 PM EST
Module 2
Data Exploration &
Visualization
Thursday, February 18, 2021 &
Friday, February 19, 2021
11:00 AM – 1:30 PM EST
Module 3
Statistical Distributions, Sampling, & Hypotheses Testing
Thursday, March 4, 2021 &
Friday, March 5, 2021
11:00 AM – 1:30 PM EST
Module 4
Cluster Analysis
Thursday, March 18, 2021 &
Friday, March 19, 2021
11:00 AM – 1:30 PM EDT
Module 5
Neural Networks Models
Thursday, April 1, 2021 &
Friday, April 2, 2021
11:00 AM – 1:30 PM EDT
Module 6
Classification Models
Thursday, April 15, 2021 &
Friday, April 16, 2021
11:00 AM & 1:30 PM EDT
Module 7
Tools for Applied Data Science Using Cloud-Based Platform
Thursday, April 22, 2021 &
Friday, April 23, 2021
11:00 AM – 1:30 PM EDT
Module 8
Current Research Topics Seminar: Biomedical, Clinical & Genomic Application
8.1 – Thursday, March 11, 2021
8.2 – Thursday, March 25, 2021
8.3 – Thursday, April 8, 2021
8.4 – Thursday, April 29, 2021
VADSTI Training Program Curriculum
Here are details for each of the modules
Thursday, February 11, 2021 & Friday, February 12, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR – Prem Saggar, MS
This module will introduce you to the core principles of data science and python programming and associated libraries. You will be introduced to and learn how to use Jupyter notebooks. You will understand what data science and AI can currently do. An overview of the state-of-the-art methods will be introduced and real-life examples from clinical and healthcare data will be used for illustration.
Thursday, February 18, 2021 & Friday, February 19, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR – Anas Belouali, MS, MEng
This module provides recipes for exploratory data analysis and data visualization which are critical steps in any data science project. The goal of this module is to learn how to visualize and perform initial investigations of the data so as to discover patterns, spot anomalies, test hypothesis, and check assumptions iwht the help of summary statistics and graphical representations. We will be using python to explore, filter, and manipulate the UCI diabetes; identify data anomalies and missingness; learn how to impute missing data; identify highly correlated variables.
Explore the Johns Hopkins University COVID-19 data repository and import the data and wrangle the data to look at the number of reported confirmed cases by country and regions; plot the number of reported confirmed cases and deaths by country. In addition, we will use the COVID-19 tracking project dataset to explore racial disparities in COVID-19 mortality and infections in the US.
Thursday, March 4, 2021 & Friday, March 5, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR: Nawar Shara, PhD
This module will introduce you to basic probability and statistical concepts. You will learn about different descriptive and inferential statistical techniques that are utilized in data science. You will learn about commonly used statistical distributions functions including Binomial, Poisson, and Normal distributions. You will understand and learn about appropriate the way to formulate hypotheses statements and select appropriate statistical techniques for testing.
Thursday, March 18, 2021 & Friday, March 19, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR: Martin Skarzynski, OhD
Module will discuss Data Description and Clustering. Similarity measures and dimensionality reduction will be discussed. Learn about k-means algorithm, and hierachial clustering. Unsupervised learning with clinical and/or genomic dataset will be used for illustration.
Thursday, April 1, 2021 & Friday, April 2, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR: Gloria Washington, PhD
The module will introduce you to machine learning and neural networks. Supervised learning, linear models for regression, and basic neural network structure will be discussed. The expected competencies are to (i) understand the motivation and functioning of the most common types of deep neural networks, (ii) understand the choices and limitations of a model for a given setting, (iii) apply deep learning techniques to practical problems, and (iv) critically evaluate model performance and interpret results. You will learn to design neural network architectures and training procedures through hands-on activities.
Thursday, April 15, 2021 & Friday, April 16, 2021
11:00 AM – 1:30 PM EST
INSTRUCTOR: Prem Saggar, MS
You will understand Classification methods, and how they differ from rules-based systems. Learn about discriminant analysis, and the k-nearest neighbor classifier. Working examples are illustrated with Classification Trees and Neural Network models.
11:00 AM – 1:30 PM EST
INSTRUCTOR: NHGRI AnVIL Team
8.1 Data Science Applications to Covid-19 & Data Mining in Higher Education | Thursday, March 11, 2021
11:00 AM – 1:00 PM EST
INSTRUCTOR: Prem Saggar, MS
8.2 What You Can Learn About Human Gene Expression When Making 70,000 RNA-seq Samples Easy to Use | Thursday, March 25, 2021
11:00 AM – 1:00 PM EDT
INSTRUCTOR: Jeff Leek, PhD
8.3 Data Science & Ethical Consideration | Thursday, April 8. 2021
11:00 AM – 1:00 PM EDT
INSTRUCTOR: Rochelle Tractenberg, PhD
8.4 Finally Finishing the Human Genome | Thursday, April 29, 2021
11:00 AM – 1:00 PM EDT
INSTRUCTOR: Adam Phillipy, PhD