kdb programming challenge – Morse Code

Blog kdb+ 4 Mar 2015

Data Intellect

Morse Code

dit dit dit – dah dah dah – dit dit dit

“dit dit dit dit – dit dit” We hear this noise in films, war documentaries and even in everyday life. What is this adorable little sound of melody? Yes you are right. This is not just any random sound generated by highly skilled operator (myself) using this little clicking thingy, but THE Morse Code itself. It is just a fancy way of saying “hi”!

And today we have no less than an old school Morse Code challenge (with a bit of my twist). Why? Well, the story is that few days ago an explosion happened in the office (reason unknown) and the staff – all Morse Code experts – are now deaf. Despite the explosion we still have desktops, keyboards and mice for work ;). Due to budget constraints, our CEO decided to hire a junior Morse Code operator as a secretary to handle calls and stuff. One day, we received an important call from our analyst and our operator translated everything to Morse Code. Disaster! We ran out of ink! So we are left with a blank piece of paper with no message on it. But the good thing is we always keep a log of the timestamps when the operator hits the transmitter.

Here is the log we have. Can you parse the log and find out what the message is? Oh and one last thing:

[code langauge = css]
–. — — -..|.-.. ..- -.-. -.-|.- -. -..|…. .- …- .|..-. ..- -.|.– .. – ….|- …. .|-.-. …. .- .-.. .-.. . -. –. .
[/code]

[showhide type=”post1″ more_text=”Show Solution” less_text=”Solution” hidden=”yes”]

Firstly, we need to parse the timestamps from the log properly since they are not actually in kdb format. There are a few ways to do it, here is an example:

[code langauge = css]
q)raze(”    T”;” “)0:`:mc_log_20150213.log
[/code]

This code reads in the log as text, separates them using space delimiter, takes the last column as time type and brings the result up one level of depth.

With the available timestamps, we need the time difference between them to find out which is short/long beep.

[code langauge = css]
q)1_deltas raze(”    T”;” “)0:`:mc_log_20150213.log
[/code]

Now let us analyse the data a bit.

Kdb Morse Code Graph

 

Remember that the log is the time when transmitter is hit. So there can be 6 scenarios between the 2 timestamps (short beep, short beep and short wait, long beep, long beep and short wait, short beep and long wait, long beep and long wait). You can learn more about the timing of morse code here (this problem doesn’t follow the exact unit measurement of waiting time, junior operator is still learning his morse code!). It might not be easy to spot that but let us re-arrange the data a bit.

morse2

Can you see it now? The last 2 highest bars are surely short beep long wait and long beep long wait. We can see that the difference between them is more than 200. Since the first few bars are short beeps, then the bars at 400 mark must be the long beep. The bars before 400 mark is surely the short beep short wait and you can make out the rest. Bucketing the differences makes it even more obvious.

[code langauge = css]
q)asc distinct 50 xbar 1_deltas raze(” T”;” “)0:`:mc_log_20150213.log
00:00:00.150 00:00:00.350 00:00:00.400 00:00:00.600 00:00:01.050 00:00:01.300
[/code]

With this information, we can translate the timestamps into morse code as such:

[code langauge = css]
q)raze(“.”;”. “;”-“;”- “;”.|”;”-|”)asc[distinct l]bin l:50 xbar 1_deltas raze(” T”;” “)0:`:mc_log_20150213.log
“- —|- …. .|– .- –. -. .. ..-. .. -.-. . -. -|.- –.- ..- .- –.-|. — .–. .-.. — -.– . . … –..–|.– .|…. .- …- .|..-. .. -. .- .-.. .-.. -.–|.–. .-. — -.. ..- -.-. . -..|- …. .|.-. . … ..- .-.. -|— ..-.|- …. .. …|..-. .-. .. -.. .- -.– .—-. …|. ..- .-. — — .. .-.. .-.. .. — -. …|.-.. — – – . .-. -.–|-. ..- — -… . .-. … .-.-.-|.– .|-… . .-.. .. . …- .|- …. .. …|.. …|.-|—-. —-. .-.-.- —-. —-.|.–. . .-. -.-. . -. -|-.-. …. .- -. -.-. .|— ..-.|.– .. -. -. .. -. –.|- …. .|.—- —– —–|– .. .-.. .-.. .. — -.|. ..- .-. —|.–. — — .-.. .-.-.-|… —|- …. .|-. ..- — -… . .-. …|.- .-. .|…..|.—- .—-|..— …–|..— —-.|….- …–|.- -. -..|- …. .|.-.. ..- -.-. -.-|… – .- .-. …|.- .-. .|.—-|–… .-.-.”
[/code]

Notation being:

[code langauge = css]
“.”| short beep
“-“| long beep
” “| wait between letters
“|”| wait between words
[/code]

The code splits the time differences into their group using binary search against 00:00:00.150 00:00:00.350 00:00:00.400 00:00:00.600 00:00:01.050 00:00:01.300. This is then indexed to their morse code using (“.”;”. “;”-“;”- “;”.|”;”-|”). Finally we use “raze” to concatenate the strings into one line. Note that we use “bin” here to increase the efficiency of the command since linear search(?) is slower.

Another way to perform such binary search mapping (thanks to Walter Pallestrong) is using step function by applying sorted attribute to the mapping dictionary.

[code]
q)raze(`s#asc[distinct l]!(“.”;”. “;”-“;”- “;”.|”;”-|”))l:50 xbar 1_deltas raze(” T”;” “)0:`:mc_log_20150213.log
[/code]

We have the morse code now! Hurrah! But wait, what about the last timestamp? The last timestamp can be either short/long beep! How do we know which one is it? Well we don’t at the moment, so we need to test both scenarios to see which message makes more sense.

[code langauge = css]
q)raze[(“.”;”. “;”-“;”- “;”.|”;”-|”)asc[distinct l]bin l:50 xbar 1_deltas raze(” T”;” “)0:`:mc_log_20150213.log],/:”.-”
[/code]

Now all we need is a morse-code dictionary to translate the code to actual words:

[code langauge = css]
q)m:![`$(“.-“;”-…”;”-.-.”;”-..”;”.”;”..-.”;”–.”;”….”;”..”;”.—“;”-.-“;”.-..”;”–“;”-.”;”—“;”.–.”;”–.-“;”.-.”;”…”;”-“;”..-“;”…-“;”.–“;”-..-“;”-.–“;”–..”;”.—-“;”..—“;”…–“;”….-“;”…..”;”-….”;”–…”;”—..”;”—-.”;”—–“;”.-.-.-“;”–..–“;”.—-.”);.Q.a,”1234567890.,'”]

q){” “sv m`$” “vs'”|”vs x}each raze[(“.”;”. “;”-“;”- “;”.|”;”-|”)asc[distinct l]bin l:50 xbar 1_deltas raze(” T”;” “)0:`:mc_log_20150213.log],/:”.-”
“to the magnificent aquaq employees, we have finally produced the result of this friday’s euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7 ”
“to the magnificent aquaq employees, we have finally produced the result of this friday’s euromillions lottery numbers. we believe this is a 99.99 percent chance of winning the 100 million euro pool. so the numbers are 5 11 23 29 43 and the luck stars are 1 7.”
[/code]

So from the result, we can see that the last message makes more sense and the first one can be discarded since the last char is not parsed properly.

I hope you enjoy this challenge. It touches ETL, some data analysis and parsing data into readable format which I think increase the learning curve quite well. Until next time!
[/showhide]

Share this:

LET'S CHAT ABOUT YOUR PROJECT.

GET IN TOUCH