programming

Sorting Coursera Course List by duration


With the lockdown going on, was too bored. Work from Home, playing with kids and playing Injustice, have sort of become a routine. The first weeks on lockdown I spent the spare time watching Arrow on Airtel Xstream. which suddenly disappeared from the platform. didn’t know what else to do, Coursera announced a free courses saw the cloud computing 101 and thought would refresh my memory on the subject. It was 3 weeks long. With byte sized videos and mostly theory i sort of finished it in 1.5 – 2 weeks. So, thought would look into some other short duration courses.

Coursera page has a categories list and once you select a category from the filters you can select duration and get a list of courses within the selected duration. But then you cannot sort by duration.

I first looked to see if Coursera offered any APIs. They do provide but you have to join their affiliate program. I just want to sort by duration. So this will not be helpful for us.

So I decided to see what happens behind the scenes when I set a filter and then use that to get the data and sort it separately.

So what happens behind the scenes

When checking with dev tools, i observed the following. When we select a filter and click on Apply Filters, a POST request is sent to this URL

https://www.coursera.org/graphqlBatch?opname=catalogResultQuery

that doesn’t seem right, right? the filter information is not there. That information is being sent as post data. Checking the payload, there was this huge JSON data which was being sent. It had all the information, a huge query which coursera backend uses to fetch the required data i guess. but the thing which we are interested in is the variables object

{
  "limit":30,
  "facets":[
    "skillNameMultiTag",
    "jobTitleMultiTag",
    "difficultyLevelTag",
    "languages",
    "productDurationEnum:1-4 Weeks",
    "entityTypeTag",
    "partnerMultiTag",
    "categoryMultiTag:information-technology",
    "subcategoryMultiTag"
  ],
  "sortField":"",
  "start":"0",
  "skip":false
}

the information that we would have to manipulate are limit, start and the value “categoryMultiTag:information-technology” inside the facets list. the limit and start are used to control pagination, and the number of courses to return in a single request. and categoryMultiTag is the course category that we are interested in. Making a request to this URL we get a JSON response, which has the courses information and the pagination.

The pagination details are found under [“data”][“CatalogResultsV2Resource”][“browseV2”][“paging”] which has total number of courses and the next request start value.

The courses list is under [“data”][“CatalogResultsV2Resource”][“browseV2”][“elements”][0][“courses”][“elements”].

Now that we have the required information of where to get the information, what to send and what to expect in response, I wrote a script to recursively get all courses of a category and duration and dump it to csv.

you can find the entire script here https://gist.github.com/anabarasan/e6b5b6842e97592ec1eaffbc30ce703e

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s