View on GitHub

Media-Lab-Portfolio

Media Lab 2019 Cycle Portfolio

Lamtharn “Hanoi” Hantrakul

The following are a curated subset of projects, videos and links grouped roughly into my areas of expertise:

Additional links to:

–> GitHub

–> Research Portfolio

–> Blog

–> LinkedIn

Physics + Digital Signal Processing

Patent-pending Surrogate Soundboard System

My BS Thesis in Applied Physics from Yale University develops a patent-pending mechanism for transferring acoustic vibrations from a live violin to a remote violin using custom-fabricated transducers and digital inverse filters. This enables a “live” violin to be stream in real-time to a “surrogate” violin located in another concert hall through standard Web Audio protocols.

Full documentation available

systemsDiagram

Pat. Pending #1501004501. Hantrakul, L. “A surrogate soundboard system for the violin” Filed 10/8/2015. Department of Intellectual Property, Thailand.

Hantrakul, L. , Kuc, R. Wilen, L. “Surrogate Soundboards for novel transmission of violin audio”. Proceedings of the International Computer Music Conference (ICMC 2016) at HKU University of the Arts in Utrecht, The Netherlands.

Industrial Design + Mechanical Engineering

Fidular

Fidular is a patent-pending and award-winning cross-cultural modular fiddle system, placing in the top 5% of designs at A’ Design Awards and winning Student Runner Up to the Core 77 Design Awards. It enables musicians and luthiers to detach and interchange components such as strings and chambers between fiddles from across the Asia and the Middle East. A print of this design is currently in special exhibition at the Musical Instruments Museum in Phoenix, AZ and accepted to NIME 2016 conference.

Full documentation available.

awards

Pat. Pending #1601000261. Hantrakul, L. “A shape-shifting waveguide and interchangeable front-panel system for Asian and Middle Eastern Fiddles” Filed 20/1/2016 Department of Intellectual Property, Bangkok, Thailand.

Pat. Pending #1501005900. Hantrakul, L. “A magnetic and modular system for Asian fiddles” Filed 29/9/2015. Department of Intellectual Property, Thailand.

Hantrakul, L. “fidular: a magnetic and modular system for fiddles from Southeast Asia, East Asia and the Middle East”. Proceedings of the 2016 New Interfaces in Musical Expression (NIME 2016) at Griffith University in Brisbane, Australia.

Deep Learning + Machine Learning

The “Skywalker” Ultrasound Prosthetic

I am part of an inter-departmental team spanning AI, ultrasound and physiology that develops novel prosthetics leveraging ultrasound and machine learning. Our system delivers first-in-class finger-by-finger control by amputees, enabling high-dexterity tasks like playing piano; an impossible feat using today’s sensing on conventional prosthetics. Our work was featured last month (December 2017) by NVidia, IEEE, CNN and many other news sources. By using ultrasound over traditional electromyography (EMG), we are able to “see” deeper muscle activities in the arm.

I am one of three members that implement ML+DSP on this project.

skywalkerFeature

Publication pending for February 2018

MS Thesis: ultrasound-based finger-by-finger regression using embedded single elements

Conventional prosthetics electromyography (EMG) to sense muscle activation from the surface of an amputee’s residual muscles. My MS Thesis develops the second iteration of a new ultrasound-based system outlined above. I use Deep Learning directly on raw ultrasound pulses to output continuous regressions. This approach skips the traditional imaging step and enables the system to be miniaturized substantially by custom designing the ultrasound transducer array. My thesis is advised by Gil Weinberg (Georgia Tech), Byron Boots (Google Brain) and Mason Bretan (Futurewei Inc.)

thesis

Due to university patents/invention disclosures, I cannot share my GitHub repository nor an image of this setup until my Thesis is published in May 2018.

SmartEar Deep Learning Auditory Scene Analysis

SmartEar is a voice-centric, AI-enabled and patended in-the-ear-canal (not your regular out-of-ear-canal earbuds!) earpiece. As a remote part-time audio deep learning engineer, I develop deep learning-based Auditory Scene Analysis (ASA) that enable SmartEar to identify its acoustic environment (e.g indoor vs outdoor, loud environment vs. quiet environment) using CNN’s on beamformed audio from the earpiece.

This work is a joint industry-academia research project between SmartEar and the Robotic Musicianship Group (my lab affiliation) at the Georgia Institute of Technology.

Audio & Music + ML

Extending Magenta’s NSynth

Recently featured on the Google Magenta Blog, our hack won the Outside Lands music hackathon! I led the winning team and documented how we connected the open sourced NSynth model to a real-time data streaming system that enabled a group of phones to control where on the latent space Z in the autoencoder the waveform was being generated from.

GitHub repository

DQN’s for Robotic Musicians

The choice of “what musical note to play next” is a process that combines multiple modalities such as sight, sound, memory and the physical body playing the instrument. ML and DL models for music often treat a sequence of notes purely as a uni-modal “statistical occurence”. This project works towards a joint representation between a robot’s DOF’s and musical cognition to inform the robot how it should path plan and coordinate its arms during musical improvisation. I develop models for learned bi-manual and tetra-manual coordination that build on top of DeepMind’s Atari-playing DQN (like multi-agent pong with 4 paddles).

GitHub repository for Baseline results and full documentation

GitHub repository for in-progress implementation using OpenAI’s reference RL algorithms

systemsDiagram

GestureRNN

In a conversation with Doug Eck from the Google Magenta team, we discussed how humans create art in “low dimensional” spaces e.g. the movements of a brushstroke vs “high dimensional” spaces e.g. the final RGB values of every pixel. Inspired by Magenta’s SketchRNN, I developed a system where an RNN draws musical gestures on the surface of a XY pad learnt from an expert musician. The RNN doesn’t generate the waveforms directly, but learns how to navigate the sonic space on the XY pad to generate music.

GitHub repository with full documentation and code

systemsDiagram

Klustr - visualization of large audio datasets

High dimensional data such as audio is often organized using simple high level descriptors like vocal_shout_1.wav or funky_bass_1.wav. However, these labels do not capture nuances between sounds? How similar are vocal_shout_1.wav and vocal_shout_5.wav? We develop a pipeline that selects the best features (like MFCC’s, STFT and WaveNet) and dimensionality reduction techniques (PCA, TSNE, UMAP) to create a 2D map of similar sounds in the space of timbre. We intend for this tool to be used by content creators in entertainment, audio production and music production to navigate large sample banks.

GitHub repository with full documentation and code

systemsDiagram

Cross-modal CNN-LSTMs for musical hallucinations

We train a neural network to regress image features from a video of a xylophone being struck, to the corresponding audio features. Briefly, we define an audio feature s(t) computed from the short-time fourier transform of the audio signal and quantized to the nearest known xylophone frequency (like a pitch chroma). We define an input “space-time image” as 3 consecutive grayscale images. These are fed through a CNN-LSTM architecture. Once trained, the idea is to get the neural network to “see” xylphones out of everyday objects that are arranged like an instrument, but do no produce sound in real life. The neural network would “hallucinate” the sound of instruments from these everyday objects.

GitHub repository with full documentation and code

Contact

hanoi7 at gmail dot com