“dit dit dit dit – dit dit” We hear this noise in films, war documentaries and even in everyday life. What is this adorable little sound of melody? Yes you are right. This is not just any random sound generated by highly skilled operator (myself) using this little clicking thingy, but THE Morse Code itself. It is just a fancy way of saying “hi”!

And today we have no less than an old school Morse Code challenge (with a bit of my twist). Why? Well, the story is that few days ago an explosion happened in the office (reason unknown) and the staff – all Morse Code experts – are now deaf. Despite the explosion we still have desktops, keyboards and mice for work ;). Due to budget constraints, our CEO decided to hire a junior Morse Code operator as a secretary to handle calls and stuff. One day, we received an important call from our analyst and our operator translated everything to Morse Code. Disaster! We ran out of ink! So we are left with a blank piece of paper with no message on it. But the good thing is we always keep a log of the timestamps when the operator hits the transmitter.

Here is the log we have. Can you parse the log and find out what the message is? Oh and one last thing:

Firstly, we need to parse the timestamps from the log properly since they are not actually in kdb format. There are a few ways to do it, here is an example:

q)raze(" T";" ")0:`:mc_log_20150213.log

This code reads in the log as text, separates them using space delimiter, takes the last column as time type and brings the result up one level of depth.

With the available timestamps, we need the time difference between them to find out which is short/long beep.

q)1_deltas raze(" T";" ")0:`:mc_log_20150213.log

Now let us analyse the data a bit.

Remember that the log is the time when transmitter is hit. So there can be 6 scenarios between the 2 timestamps (short beep, short beep and short wait, long beep, long beep and short wait, short beep and long wait, long beep and long wait). You can learn more about the timing of morse code here (this problem doesn’t follow the exact unit measurement of waiting time, junior operator is still learning his morse code!). It might not be easy to spot that but let us re-arrange the data a bit.

Can you see it now? The last 2 highest bars are surely short beep long wait and long beep long wait. We can see that the difference between them is more than 200. Since the first few bars are short beeps, then the bars at 400 mark must be the long beep. The bars before 400 mark is surely the short beep short wait and you can make out the rest. Bucketing the differences makes it even more obvious.

"."| short beep
"-"| long beep
" "| wait between letters
"|"| wait between words

The code splits the time differences into their group using binary search against 00:00:00.150 00:00:00.350 00:00:00.400 00:00:00.600 00:00:01.050 00:00:01.300. This is then indexed to their morse code using (“.”;”. “;”-“;”- “;”.|”;”-|”). Finally we use “raze” to concatenate the strings into one line. Note that we use “bin” here to increase the efficiency of the command since linear search(?) is slower.

Another way to perform such binary search mapping (thanks to Walter Pallestrong) is using step function by applying sorted attribute to the mapping dictionary.

We have the morse code now! Hurrah! But wait, what about the last timestamp? The last timestamp can be either short/long beep! How do we know which one is it? Well we don’t at the moment, so we need to test both scenarios to see which message makes more sense.

q){" "sv m`$" "vs'"|"vs x}each raze[(".";". ";"-";"- ";".|";"-|")asc[distinct l]bin l:50 xbar 1_deltas raze(" T";" ")0:`:mc_log_20150213.log],/:".-"
"to the magnificent aquaq employees, we have finally produced the result of this friday's euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7 "
"to the magnificent aquaq employees, we have finally produced the result of this friday's euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7."

So from the result, we can see that the last message makes more sense and the first one can be discarded since the last char is not parsed properly.

I hope you enjoy this challenge. It touches ETL, some data analysis and parsing data into readable format which I think increase the learning curve quite well. Until next time!

Interesting. From wikipedia it seems that “. ” and “-” gives you similar difference between hits.
“. ” is one unit for beep and three units between letters.
“-” is three units for beep and one unit between parts within letter.
If you look at your analysis you can see really 5 levels not six, how did you come up with the rule that 350-400ms is “. ” and 400… is :”-“?
What is 50 actually? If it’s unit than it’s incorrect as “.” should be 100 not 150, etc

Thanks! The log is generated by preparing the message and pre-setting the time unit for each notation:

q)l:"to the magnificent aquaq employees, we have finally produced the result of this friday's euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7."
q)s:".- |"!150 400 200 900

The message is then translated into morse code using the “m” dictionary from the solution:

q)"|"sv" "sv'string m?/:" "vs l

The morse code is then indexed to the “s” dictionary:

q)s"|"sv" "sv'string m?/:" "vs l

But this makes the problem a bit simple, so I add in some noises:

q)s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

Now add the time unit for “. ” “.|” “- ” “-|” together to obtain the true waiting time. First cut time units:

q)where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

and do the sums:

q)sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

Add the sums to a time and omit the last time (don’t want the end time of transmission):

q)-1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

Lastly change the time to unix format:

q)"Fri, 13 Feb 2015 ",/:string -1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

and save to a file:

q)`:mc_log_20150213.log 0:"Fri, 13 Feb 2015 ",/:string -1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10

Wish I have something here that can generate the log more realistically but here you go 🙂

## Comments 7

Pingback: Morse code kdb+ programmer challenge

Interesting. From wikipedia it seems that “. ” and “-” gives you similar difference between hits.

“. ” is one unit for beep and three units between letters.

“-” is three units for beep and one unit between parts within letter.

If you look at your analysis you can see really 5 levels not six, how did you come up with the rule that 350-400ms is “. ” and 400… is :”-“?

What is 50 actually? If it’s unit than it’s incorrect as “.” should be 100 not 150, etc

Do I miss something?

Hi Nikita, I’ve mentioned that:

So the unit of measurement in wiki doesn’t apply in this problem. You will have to look at the data and figure out which one is which.

This was a really cool challenge, especially for cw fans. Very tidy solution. How was the log generated?

Thanks! The log is generated by preparing the message and pre-setting the time unit for each notation:

`q)l:"to the magnificent aquaq employees, we have finally produced the result of this friday's euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7."`

q)s:".- |"!150 400 200 900

The message is then translated into morse code using the “m” dictionary from the solution:

`q)"|"sv" "sv'string m?/:" "vs l`

The morse code is then indexed to the “s” dictionary:

`q)s"|"sv" "sv'string m?/:" "vs l`

But this makes the problem a bit simple, so I add in some noises:

`q)s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

Now add the time unit for “. ” “.|” “- ” “-|” together to obtain the true waiting time. First cut time units:

`q)where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

and do the sums:

`q)sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

Add the sums to a time and omit the last time (don’t want the end time of transmission):

`q)-1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

Lastly change the time to unix format:

`q)"Fri, 13 Feb 2015 ",/:string -1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

and save to a file:

`q)`:mc_log_20150213.log 0:"Fri, 13 Feb 2015 ",/:string -1_.z.t+0,sums sum@'where[not r in" |"]cut s[r]+count[r:"|"sv" "sv'string m?/:" "vs l]?10`

Wish I have something here that can generate the log more realistically but here you go 🙂

I need help decoding a morse code message that is being sent with a flashing light. Can anyone help? jennifer.harrison1978@gmail.com

200 0 2020 22 200 0 2020 222 22 2002 0200 0 2 0 😀

“good luck and have lun emith the challenge”