programming

getting cidr from the ipaddress / subnet

I have been working on the hiera data generation / validation process. for generation of certain detail i needed both the ip address in the usual 1.1.1.1 way and cidr way. I had initially asked for both as input, when the user’s came back saying what if i give the subnet information along with the ip address like 1.1.1.1/9 and you can calculate the CIDR information. Sounds right.

I had initially asked for that from the user because i did not know how to calculate thing. But since the user has asked it to be that way, i started to look into how to arrive that value. I had the first clue with me, the user had told to look in to XOR based calculation. I started with a generic search to first understand what it and how to arrive it.

May be my Search fu is not very good or whatever, my search always lead to getting ip ranges / calculating the subnet mask with CIDR, and not how to arrive at the CIDR.

There were a few calculators which gave what i wanted, but did not explain how to arrive at it. but playing around with them and seeing the outputs, i could sense a sort of pattern emerging, then thought i have seen something like this before. Then i remembered that there was a third party package which some other application was using at work. i searched for it and lo and behold python3 has that function built in. not sure if python2 had it or not.

Python3 has this module ipaddress which can be used for ip address related things. so for my use case all i had to do is call the appropriate function and give the ipaddress/subnet information and get the output. I had to set strict argument to False. so that i would take ip address also as part of input instead of proper cidr.

>>> import ipaddress
>>> network = ipaddress.IPv4Network("10.25.123.20/20", strict=False)
>>> str(network)
'10.25.112.0/20'
>>> 

Office, programming

Testing hiera data

I have been looking around for ways to validate puppet hiera data files.

my requirements were

  • check if the yaml is well formed.
  • check if all required settings are provided in the yamls
  • check if the required settings values are of expected format
  • inform the how a value is being considered when the setting is present in multiple files in hierarchy.

The basic idea that i had arrived at was to use json schema based validation. Basically create schemas for the various files in the Hiera Hierarchy, and validate each file against its own schema.

I already had a toy project which does that, which i made sometime last year. But, thought would research a little bit and do things a little properly.

I was curious to see if this problem hasn’t been faced by others or not. Searching around the internet found that puppet has a command which sort of does this. it’s not exactly a validation of the data, but what would hiera return when the setting is being requested.

puppet lookup command

That still wouldn’t help in our case. we can see what setting would be applied, but if the engineer has filled it in the correct format or not, if this is a boolean or string, those cannot be validated. also we can lookup only one setting at a time. But it helps in seeing how puppet arrives at the value with the helpful “explain” option.

on further searching i found that the exact same idea has been thought off and they also referred to tools which would help to achieve that.

https://logicminds.github.io/blog/2016-01-16-testing-hiera-data/

The post describes about puppet-retrospec and kwalify to achieve that. but those didn’t work out much, seems they are for older version of puppet.

for ensuring if the yaml is right or not, thought there should be something like pylint or jslint, and i was right, a simple search pointed me to something called yamllint. Which also had a python interface with which it can be included into the our scripts. a simple example of it would be like

from yamllint import linter
from yamllint.config import YamlLintConfig

conf = YamlLintConfig('extends: default')

filepath = "./test.yaml"
with open(filepath) as yamlfile:
    yaml = yamlfile.read()
problems = linter.run(yaml, conf, filepath)
for problem in problems:
    print(f"{problem.line}:{problem.column} \t {problem.level} \t {problem.desc} ({problem.rule})")

which produces

$ python testyamllint.py 
1:1      warning         missing document start "---" (document-start)
3:1      error   duplication of key "a" in mapping (key-duplicates)
3:7      error   no new line character at the end of file (new-line-at-end-of-file)

for a sample yaml

a: "b"
b: "c"
a: "d"

let’s see how it goes.

programming

Sorting Coursera Course List by duration

With the lockdown going on, was too bored. Work from Home, playing with kids and playing Injustice, have sort of become a routine. The first weeks on lockdown I spent the spare time watching Arrow on Airtel Xstream. which suddenly disappeared from the platform. didn’t know what else to do, Coursera announced a free courses saw the cloud computing 101 and thought would refresh my memory on the subject. It was 3 weeks long. With byte sized videos and mostly theory i sort of finished it in 1.5 – 2 weeks. So, thought would look into some other short duration courses.

Coursera page has a categories list and once you select a category from the filters you can select duration and get a list of courses within the selected duration. But then you cannot sort by duration.

I first looked to see if Coursera offered any APIs. They do provide but you have to join their affiliate program. I just want to sort by duration. So this will not be helpful for us.

So I decided to see what happens behind the scenes when I set a filter and then use that to get the data and sort it separately.

So what happens behind the scenes

When checking with dev tools, i observed the following. When we select a filter and click on Apply Filters, a POST request is sent to this URL

https://www.coursera.org/graphqlBatch?opname=catalogResultQuery

that doesn’t seem right, right? the filter information is not there. That information is being sent as post data. Checking the payload, there was this huge JSON data which was being sent. It had all the information, a huge query which coursera backend uses to fetch the required data i guess. but the thing which we are interested in is the variables object

{
  "limit":30,
  "facets":[
    "skillNameMultiTag",
    "jobTitleMultiTag",
    "difficultyLevelTag",
    "languages",
    "productDurationEnum:1-4 Weeks",
    "entityTypeTag",
    "partnerMultiTag",
    "categoryMultiTag:information-technology",
    "subcategoryMultiTag"
  ],
  "sortField":"",
  "start":"0",
  "skip":false
}

the information that we would have to manipulate are limit, start and the value “categoryMultiTag:information-technology” inside the facets list. the limit and start are used to control pagination, and the number of courses to return in a single request. and categoryMultiTag is the course category that we are interested in. Making a request to this URL we get a JSON response, which has the courses information and the pagination.

The pagination details are found under [“data”][“CatalogResultsV2Resource”][“browseV2”][“paging”] which has total number of courses and the next request start value.

The courses list is under [“data”][“CatalogResultsV2Resource”][“browseV2”][“elements”][0][“courses”][“elements”].

Now that we have the required information of where to get the information, what to send and what to expect in response, I wrote a script to recursively get all courses of a category and duration and dump it to csv.

you can find the entire script here https://gist.github.com/anabarasan/e6b5b6842e97592ec1eaffbc30ce703e