Sangeet Chandaliya
Context
In my previous post, I decided to spend the last 2 weeks extracting data from 2-3 course providers and storing them in a standardized format as part of Project Manabu. In this post, I dive deeper into the progress made against these agendas and plan the next steps.
#1 Shortlisting course providers
For shortlisting course providers, I asked the following questions (in order) -
- Are “solo” course creators actively using the platform for marketing their courses?
- How many visitors do these platforms receive?
- What course-related data points are available for analyzing performance?
- How difficult would it be to extract the course parameters?
- Would I need to break their terms & conditions for bulk-downloading course parameters?
Here is a detailed analysis of the comparison between platforms -
Conclusion: Even though I had decided to target at least 3 platforms, limiting the focus to Udemy and YouTube would be sufficient for the first version of Project Manabu.
#2 Downloading data from the course providers
a. Udemy - Since Udemy has made its APIs available for public use, I used the following 3 to estimate a course’s sales -
- Search API - For extracting basic course details (excluding price) from a specified category,
- Course API - For extracting a specific course’s curriculum and pricing details, and
- Reviews API - For extracting how fast the latest 10,000 reviews for the specific course were added.
Here are the exact parameters I extracted -
- Search API - Avg. course rating, avg. recent rating, badges, available languages, content duration, category, subcategory, creation date, title, caption, ID, level of difficulty, number of lectures, number of practice tests, number of reviews, number of subscribers, URL, and visible instructors.
- Course API - List price, discounted price, discount percentage, device access, lifetime availability, certificate availability, section, and lecture details.
- Reviews API - Individual review’s ID, content, rating, and date of creation.
b. YouTube - After reviewing the platform, I realized that it would be easier to use pre-existing APIs such as https://rapidapi.com/DataFanatic/api/youtube-media-downloader/ to fetch videos and their related metrics.
Conclusion: Successfully downloaded sample course and reviews data for ~40K Udemy courses. Tested the YouTube API to extract the different data points.
#3 Standardizing course parameters
After reviewing the parameters available for each Udemy course and YouTube video, I created the following set of standard course parameters (each parameter represents a column in the course_ingestion_db table) -
Targets for weeks 3-4
- Start creating user personas, along with queries they might be interested in. Using these queries as well as the sample data, build the first set of dashboards in Google Spreadsheet.
- Apart from course parameters, I’d also need to standardize parameters for storing reviews / comments, and creator / instructor details.
That’s it for this week! I look forward to continue building.