Thursday, August 11, 2011

Optical illusion time!

These are all triangles. There are no squares.

▲ ▼ ► ◄ ▲ ◄ ► ◄ ► ▲ ► ▲ ► ▲ ▼ ◄ ◄ ► ▼ ► ▼ ▼ ◄ ► ◄ ▲ ▲ ► ► ▲ ▲ ◄ ► ► ▲ ▼ ► ◄ ◄ ◄
► ▲ ► ► ▲ ◄ ▼ ► ▼ ▼ ▲ ▼ ► ► ◄ ▼ ► ◄ ▲ ▲ ◄ ▼ ► ► ▲ ► ► ► ▲ ► ◄ ▲ ▲ ► ◄ ◄ ▼ ▲ ▼ ▼
◄ ▲ ◄ ◄ ► ◄ ◄ ◄ ▼ ▼ ▼ ▲ ► ► ► ▲ ► ▼ ▲ ► ▲ ◄ ▼ ◄ ▲ ► ▼ ▼ ◄ ▲ ► ► ▲ ▲ ▲ ► ▲ ▼ ► ◄
◄ ▲ ► ◄ ▼ ▼ ▲ ▼ ▼ ▼ ► ► ▲ ▲ ◄ ► ▲ ◄ ◄ ▼ ◄ ▲ ► ▲ ► ◄ ► ▼ ▲ ◄ ▼ ► ◄ ▲ ▼ ◄ ▼ ▲ ▼ ▼
▼ ▼ ► ▲ ▼ ► ◄ ▲ ▼ ◄ ▲ ◄ ▲ ◄ ► ◄ ▼ ▲ ▲ ▲ ► ◄ ► ▲ ▲ ◄ ► ▼ ◄ ◄ ◄ ► ▼ ► ◄ ◄ ► ◄ ◄ ▲
► ◄ ▲ ▼ ▲ ► ▼ ▼ ► ▼ ◄ ◄ ► ► ▼ ▼ ◄ ► ▲ ► ▲ ▼ ◄ ▼ ▼ ▲ ▼ ▼ ◄ ▲ ◄ ▼ ▲ ▼ ▼ ◄ ► ► ◄ ▲
► ▼ ► ▲ ▼ ▼ ▼ ◄ ◄ ◄ ◄ ▲ ▼ ◄ ▲ ► ▲ ▼ ▲ ▼ ▼ ◄ ◄ ◄ ◄ ◄ ◄ ▼ ▼ ▲ ► ► ► ► ◄ ► ► ▼ ▲ ▲
▼ ► ► ▼ ▼ ► ▼ ▲ ▼ ▲ ▼ ► ► ▼ ▲ ▲ ▼ ► ▲ ► ▼ ▼ ▲ ▼ ▲ ► ◄ ▲ ◄ ▲ ▲ ▼ ◄ ▲ ▼ ◄ ▲ ▲ ◄ ◄
► ► ▼ ▼ ► ► ◄ ► ▼ ◄ ► ▼ ▲ ◄ ▲ ◄ ▼ ▲ ▼ ▼ ▲ ▲ ▼ ► ► ► ► ◄ ▲ ◄ ◄ ► ▼ ▲ ◄ ▲ ▲ ▲ ◄ ►
◄ ► ▼ ▲ ▲ ▲ ► ► ▲ ▼ ▼ ◄ ▲ ◄ ▼ ◄ ▲ ► ▼ ◄ ◄ ▲ ◄ ◄ ► ► ◄ ▼ ◄ ▲ ▼ ◄ ▼ ▲ ▲ ◄ ▼ ▼ ▼ ►

Monday, December 20, 2010

A Glimpse at the Future of Computing

I was thinking of the whole "Cloud Computing" controversy and was reminded of an article I read back in 2003.


A Glimpse at the Future of Computing

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
by Phocks

published in 2600 Magazine Vol. 20, No. 1, Spring 2003

Imagine a world, if you will, plagued by terrorists and evildoers, whose weapon is the personal computer. It has powerful encryption used to block anyone from reading plans of how to destroy structures vital to a country's survival. It contains a slew of programs designed solely for destroying security and rendering the world helpless to attacks. And anonymously connecting to a terrorist network consisting of tens of thousands of systems just like it, bringing together all who oppose a country to share information and formulate plans of attack. Welcome to the government's view of the Internet. An innumerable array of systems that have direct access to any one another at any given time, able to share data with a grade of encryption higher than their own military standards.

Something must be done to contain the threat for the good of the world. These systems which are run without regulation of any kind; controlled and even built by those who operate them, must be stopped for there is no telling what they are doing. It has even been proven that millions of these systems can come together to shatter the encryption that holds this country's secrets (distributed.net). Something must be done - to let all activities be controlled, to bring all this terrorism to a halt. To shut down the Internet.

A scheme that sounds so improbable, nay, impossible, is easily completed. All that must be done is pass a new bill (or hide an appendage to an existing one) that will force the ISPs of the country to obey new government standards, to all connect to a central server array that is tightly controlled by the government, and shut off all access to foreign servers.

Simply put, dismantle the Internet in the United States (or any other country that wants to implement such a system) and rebuild it the "right" way. The way that can be constantly monitored for suspicious, terroristic activity.

Personal computers will also become completely incompatible with the new standard. In exchange for turning in your computer to the local recycling center, you will be given a voucher for a free USNet (the new, patriotic "Internet" name] terminal. The terminal will consist of a flat panel monitor, a moderate processor (450 MHz), a mediocre sound card, 32m of ram, a mouse, a keyboard, and a USNet connection card (proprietary) ISDN-based modem for both speed and compatibility. No hard drive, no networking card, no CD drive, no floppy drive, no external or internal media at all allowed. The USNet terminal will cost no more than $150 (less than $100 for manufacturers to build), and will be greatly appreciated by the manufacturers because of the extremely high profit made from selling millions of machines to anyone who wants a computer.

How it works without a hard drive is simple. The operating system is stored on pre-burnt ROM and is checked by the USNet servers every three minutes to make sure it's working properly. All web servers are run on the government's super cluster of servers, and a second cluster (or rather, section of the super cluster) is designated for the personal systems. Every user is allotted one gigabyte of storage on the USNet system, which is more than enough.

Everyone wins on this system, for downloads take mere seconds since the personal data section is directly linked with the servers. All programs are run remotely, and only the data that is entered to them (such as typed words in a word processor) is stored in ram until sent out. No trace of the program is allowed on the USNet terminal, for fear of terrorists editing the ram and taking control of the programs.

It even works out for software designers like Microsoft. Office tools will not need to be sold, only paid for on a per-use basis. That way everyone wins; the customer doesn't pay for anything that they don't use and the corporations get paid for every use.

Only programs carefully scrutinized by the government are allowed to be run and no amateur programming at all is allowed, for programs should be left to the corporations - that is what they are for. There is no need for a user to program anything. The corporations will take care of everything necessary, even special USNet games that are finally family-friendly. Even the censors will be happy.

Since USNet covers anything a computer should be used for in a free, but secure, society, all other computers will become illegal to own. Why would you have one for any other reason than keeping secrets from the government? Everything will be taken. But you will get money back because the government knows what an investment all that technology must have been. Desktop computers will be exchanged for $150, enough to buy a USNet terminal, and everything from laptops to PDAs will be confiscated on sight, but a voucher will be issued by the officer stating what model and condition it is, and will be cashed at a fair value (not to exceed $200).

All data that enters and exits the USNet clusters will be scanned thoroughly for anything that may be suspicious, such as terrorist-like texts that defame the country. All transactions between servers and personal areas will be logged, and personal data sections cannot send files to one another, lest there be music or movie piracy. In such a system, everyone will be happy because they can chat and play games and run office programs, and the government gets to carefully watch all activity for anything suspicious and keep a tight control of USNet to let it be safe for children to browse, since only their servers can communicate data. That way even the schools and parents can let young children browse the USNet without a single worry, for there will be no more pornography or online stalkers (because all communications are watched by specialized computers to look for any suspicious activity) and all activist pages like those that share information on the Secret Service to terrorist networks and those that actually help evil software pirates and hackers will be shut down forever.

Shoutouts to:
psykOmantis, Vie, Twilyght, Arwynn, everyone from SPR and Taps, and anyone who stands by my side, physically or digitally (too many to name personally). I'd like to point out the obvious - that the general happy and positive attitude is not my own. It merely fits the article.

Now, doesn't it strike you as uncanny? Scale the tech appropriately and there are some definite parallels to what Google is trying to do with its Chrome OS, a completely browser-based operating system.

Tuesday, August 10, 2010

Signal Trapping

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ____________ . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . / It's a trap! \ . .
. . . . . . . . . . . . . . . _,,,--~~~~~~~~~~~~--,_ . . . .\______________/ . .
. . . . . . . . . . . . . ,-` : : : :::: :::: :: : : : o `-, . . \/. . . . . . .
. . . . . . . . . . . .,-` :: : : :::: :::: :::: :::: : :O : `-, . . . . . . . .
. . . . . . . . . . ,-` :: ::: :: : : :: :::: :::: :: : : : : :O `-, . . . . . .
. . . . . . . . .,-` : :: :: :: :: :: : : : : : , : : :* :::: :::: :'; . . . . .
. . . . . . . ,-` / / : :: :: :: :: : : :::: :::-, ;; ;; ;; ;; ;; ;; ;\. . . . .
. . . . . . /,-`,' :: : : : : : : : : :: :: :: : '-, ;; ;; ;; ;; ;; ;; | . . . .
. . . . . /,',-` :: :: :: :: :: :: :: : ::_,-~~~~,_'-, ;; ;; ;; ;; ;; |. . . . .
. . . . _/ :,' :/ :: :: :: : : :: :: _,-'/ : ,-';'-'''''~-, ;; ;; ;;,' . . . . .
. . ,-` / : : : : : : ,-''' : : :,--'' :|| /,-`-`'__`''' \ ;; ;,-` . . . . . . .
. . \ :/,, : : : _,-' --,,_ : : \ :\ || / /,-'-x### ::\ \ ;; / . . . . . . . . .
. . . \/ /---'''' : \ #\ : :\ : : \ :\ \| | : (O##* : :/ /-` . . . . . . . . . .
. . . /,'____ : :\ '-#\ : \, : :\ :\ \ \ : '-,___,-',-` -,,. . . . . . . . . . .
. . . ' ) : : : :''''--,,--,,,,,,^ \ \ :: ::--,,_''-,,'''^ :'- :'-,. . . . . . .
. . . .) : : : : : : ,, : ''''~~~~' \ :: :: :: :'''''^ :: ,-' :,/\ . . . . . . .
. . . .\,/ /|\\| | :/ / : : : : : : : ,'-, :: :: :: :: ::,--'' :,-' \ \. . . . .
. . . .\\'|\\ \|/ '/ / :: :_--,, : , | )'; :: :: :: :,-'' : ,-' : : :\ \,. . . .
. . ./^ :| \ |\ : |/\ :: ::----, :\/ :|/ :: :: ,-'' : :,-' : : : : : : `'-,,_. .
. . | : : :/ `'-(, :: :: :: '''''~,,,,,'' :: ,-'' : :,-' : : : : : : : : :,-`''\
. ,-: : : | : : `') : : :^''''~-,: : ,--''' : :,-`' : : : : : : : : : ,-` :^''''
./ : : : :'-, :: | :: :: :: _,,-`'''^ : ,--`' : : : : : : : : : : : / : : : : :
/ : : : : -, :^'''''''''''^ : : _,,-~`' : : : : : : : : : : : : : :| : : : : : :
: : : : : :^''~~~~~~~~~~~~~''' : : : : : : : : : : : : : : : : : : | : : : : : :

Here it is. Admiral Ackbar with his famous words. Finally wrapped to 80 characters. Include in source code at will.

Tuesday, April 21, 2009

Eliza Strikes Again!

Blassic 0.10.0
(C) 2001-2005 Julian Albo

Ok
HELP
Syntax horror
Ok
LOAD "ELIZA.BAS"
Ok
RUN
**************************
ELIZA
CREATIVE COMPUTING
MORRISTOWN, NEW JERSEY

ADAPTED FOR IBM PC BY
PATRICIA DANIELSON AND PAUL HASHFIELD
BE SURE THAT THE CAPS LOCK IS ON

PLEASE DON'T USE COMMAS OR PERIODS IN YOUR INPUTS

*************************





HI! I'M ELIZA. WHAT'S YOUR PROBLEM?

? HELLO

HOW DO YOU DO ...PLEASE STATE YOUR PROBLEM.

? NO PROBLEMS

ARE YOU SAYING NO JUST TO BE NEGATIVE?

? NO THERE REALLY ARE NO PROBLEMS WITH ME

YOU ARE BEING A BIT NEGATIVE.

? NO I AM NOT

DID YOU COME TO ME BECAUSE YOU ARE NOT

? I AM NOT WHAT

HOW LONG HAVE YOU BEEN NOT WHAT

? NOT WHAT WHAT

WHY DO YOU ASK?

? NEVERMIND

SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS?

? NO

WHY NOT?

? BECAUSE I AM A WELL-ADJUSTED INDIVIDUAL

DO YOU BELIEVE IT IS NORMAL TO BE A WELL-ADJUSTED INDIVIDUAL

? YES I DO

ARE YOU SURE?

? YES

I SEE.

? WHAT DO YOU SEE

WE WERE DISCUSSING YOU--NOT ME.

? FINE IF YOU SAY SO

OH, I SAY SO

? OH YEAH?

WHAT DOES THAT SUGGEST TO YOU?

? IT SUGGESTS TO ME THAT YOU ARE JUST A STUPID COMPUTER PROGRAM

WHAT MAKES YOU THINK I AM JUST A STUPID COMPUTER PROGRAM

? BECAUSE I HAVE YOUR SOURCE CODE ON MY HARD DRIVE

IS THAT THE REAL REASON?

? IT IS LESS THAN 10 KILOBYTES

I SEE.

? DOES THAT MAKE YOU FEEL SMALL?

YOU'RE NOT REALLY TALKING ABOUT ME, ARE YOU?

? I AM PRETTY SURE I AM

DO YOU ENJOY BEING PRETTY SURE ME ARE

? ME ARE?

I'M NOT SURE I UNDERSTAND YOU FULLY.

? I ARE HOOMAN

COME COME ELUCIDATE YOUR THOUGHTS.

? YOU IS MACHINE

WE WERE DISCUSSING YOU--NOT ME.

? YOU MAKE WORDS GO WHEN I MAKE WORDS GO

OH, I MAKE WORDS GO WHEN ME MAKE WORDS GO

? WHY IS IT THAT I AM HAVING THIS CONVERSATION

DID YOU COME TO ME BECAUSE YOU ARE HAVING THIS CONVERSATION

? NO... NO. WELL I GUESS. I MEAN THE FACT THAT I AM TALKING TO MY COMPUTER CERTAINLY LEADS ME TO BELIEVE THAT I MAY HAVE A MENTAL PROBLEM

HOW LONG HAVE YOU BEEN TALKING TO YOURCOMPUTER CERTAINLY LEADS YOU TO BELIEVE THAT ME MAY HAVE A MENTAL PROBLEM

? THAT DOES NOT MAKE ANY SENSE AT ALL

ARE YOU SURE?

? PRETTY SURE

CAN YOU ELABORATE ON THAT?

? CAN YOU EVEN SPEAK ENGLISH?

DON'T YOU BELIEVE THAT I CAN EVEN SPEAK ENGLISH?

? SEEMS LIKE YOU ARE JUST PROGRAMMED TO SPLICE STRINGS AND RE-ARRANGE SYNTAX

DOES IT PLEASE YOU TO BELIEVE I AM JUST PROGRAMMED TO SPLICE STRINGS AND RE-ARRANGE SYNTAX

? DOGS

SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS?

? CATS

WHAT DOES THAT SUGGEST TO YOU?

? FLUFF

I SEE.

? BUTTERFLIES

I'M NOT SURE I UNDERSTAND YOU FULLY.

? BUTTERFLIES ARE SCARY

COME COME ELUCIDATE YOUR THOUGHTS.

? I AM SAD BECAUSE BUTTERFLIES ARE SCARY

DO YOU BELIEVE IT IS NORMAL TO BE SAD BECAUSE BUTTERFLIES AM SCARY

? BUT... THEY HAVE WEIRD FACES. THAT HAS TO COUNT FOR SOMETHING

CAN YOU ELABORATE ON THAT?

? LET'S SEE... THEY HAVE COMPOUND EYES. THAT IS FREAKY. THEY ALSO HAVE A LONG WEIRD TONGUE

SAY, DO YOU HAVE ANY PSYCHOLOGICAL PROBLEMS?

? SHUT UP!

O.K. IF YOU FEEL THAT WAY I'LL SHUT UP....

Ok

exit

Friday, February 20, 2009

More Explanations

And so, after over a month's worth of CPU time spent on calculations, my program is finally finished generating its data set. I think it is well worth it, considering I started writing this program in October of 2006, to be now finally almost finished.

Here's brief explanation of what my computer has been doing:

Preamble

First of all, audio is a sequence of micro-changes air pressure which are transmitted over distances and perceived by the human ear as sound. These air pressure variations can be expressed as a positive or negative value, depending on what increase or decrease beyond "neutral" air pressure is created:


^ | __ __ __
+ |/ \ / \ / \
0|----\----/----\----/----\----
- | \ / \ / \ /
v | -- -- --
time ->

Digital audio waveforms are merely a collection of values, in essence, between -1 and 1, inclusive, which are called samples. Samples represent what the air pressure will be at the point in time where the sample occurs. The rate at which the samples are converted to air pressure variations ("playing" the digital audio) is described as the sampling rate. The sampling rate is expressed in the number samples that represent one second of digital audio.

Through the graphing of the audio samples' values by time, we can see how they represent the original waveform:


+1| .. .. ..
|- - - - - -
0|----*----*----*----*----*----
| - - - - - -
-1| -- -- --
time ->

Because digital audio uses computerized values for the representation of these samples, the maximum height of a waveform is limited to +/-1.

Language

Regarding the deconstruction of audio data, there are certain terms used in the scope of this blog that require explanation:

amplitude

In sound, amplitude refers to the loudness of the waveform, also described as the amount of air pressure change that the waveform produces. In physical sound, it is measures in pascals. In digital sound, however it becomes a measure of its height within the limit of the samples' maximum value:


+1| __ __ __.........._
|/ \ / \ / \ |--- amplitude
0|----\----/----\----/----\----....-
| \ / \ / \ /
-1| -- -- --
time ->

frequency

A frequency, or pitch is an expression of how many times per second a given waveform completes its cycle, or period. It is expressed in Hz, or cycles per second.


+---------+---------+------ peaks
| | |
v v v
+1| __ __ __
|/ \ / \ / \
0|----\----/----\----/----\----.
| \ / \ / \ /.
-1| -- -- -- .
time -> .
. .
|----------1 second-----------|
3 peaks in 1 second => frequency: 3 Hz

frequency "bin"

A frequency bin, or bin, means a summation of the amplitudes of all frequencies associated with that bin as they occur in the deconstructed waveform.

Frequency bins are an effective representation of the loudness, or amplitude of the musical notes, or tones present in a waveform.

magnitude

Because higher-frequency waveforms carry more energy than lower-frequency ones, it becomes necessary to think of a waveform in terms of the total energy carried by it.The energy said to be carried by a given waveform is expressed as the total change in air displacement incurred by the transmission of that waveform. This can be found by finding the absolute derivative of the waveform.

The magnitude of a waveform is an expression of this. It is the amplitude of the waveform weighted by its frequency, and is found by multiplying its amplitude by its frequency.

phase, or phasing

The phase of a given waveform refers to the point in its cycle at which it is first observed. It is typically measured in radians from 0 the value of theta would be in the waveform's sine function.

A simpler way of defining it would be to think of it in terms of how far the beginning of the period of the waveform is from the origin:

An unphased waveform:


+------ first period starts at the origin; no phasing
|
v
+1| __ __ __
|/ \ / \ / \
0|----\----/----\----/----\----
| \ / \ / \ /
-1| -- -- --
time ->

A waveform with phasing:


+--- first period begins here
|
+------ began observing here (3/4 through last period)
| | => phasing of +1/4 of a period = +pi/2
v v
+1| __ __ __
| / \ / \ / \
0|--/----\----/----\----/----\-
| / \ / \ / \
-1|- -- --
time ->

The Fourier Transform

Joseph Fourier discovered that within complex waveforms exist many simpler waveforms with their own individual pitches. The Fourier transform was built as a way to decompose these complex waveforms into the simpler ones.

By design, the computerized Fourier transform gives a deconstruction of a waveform as a set of complex values, whose real and imaginary portions relate to one another in a way that it is possible to reconstruct the original waveform, complete with phased and non-phased waveforms.

For the purposes of the audio analysis performed by Audicon, the deconstuction is a one-way process. Therefore it becomes necessary to simplify the Fourier transform deconstruction's data in such a way that it is able to be processed in an ad hoc manner.

Audio "Featureset" Calculations

Each audio file is broken down into clips of audio data (each adjacent ½ of a second), along with a texture buffer of audio data in which the clip occurs (1 second before + the clip + 1 second after = 2½ seconds) and sent through the following functions:

Centroid

The centroid frequency of an audio sample is basically the center of its spectral gravity. Let me explain: If you had a waveform deconstruction from the Fourier transform, and you graphed it like so:


+----------+------- two equal parts
| |
|-----||------------|
^
^|
|| .
|| ||.
a| .|||
m| |||||.
p| | .||||||||.|. .|
+-------------------------->
freq -->

then the center of spectral gravity would be between the two divisions you see at the top of the graph.

The centroid frequency is found by way of the following function:

   F
---
\
\ f * N[f]
/
/
---
f
-------------
F
---
\
\ N[f]
/
/
---
f

where F is the highest frequency bin, f is the frequency and N[f] is the unweighted amplitude of the frequency bin representing frequency f as an output of the Fourier transform.

Magnitude Ratio

Remember the "texture" frame that was mentioned earlier? It comes into play here. Using the weighted Fourier transform bins, the magnitude ratio is just the sum of all the magnitudes in the clip divided by the sum of all the magnitudes in the texture frame. Here's the characteristic function:

       Fc
---
\
\ fc * C[fc]
/
/
---
fc
M = -------------
Ft
---
\
\ ft * T[ft]
/
/
---
ft

where Fc, Ft are the highest frequency bins for the clip and texture frame, fc and ft are their respective bins' frequencies and C[fc] and T[ft] are the respective unweighted amplitudes of the frequency bins representing frequencies fc and ft as an output of the Fourier transform.

Spectral Flux

The spectral flux is just a measure of how much the entire spectrum changes from one clip of audio to to the next. It's a sum of the absolute value of the changes of each frequency bin. Here's the characteristic function:

    F
---
\
\ (N(t)[f] - N(t-1)[f])^2
/
/
---
f

where F is the highest frequency bin, N(t)[f] and N(t-1)[f] are the unweighted amplitude of the Fourier transform bin representing frequency f at the current frame t, and previous frame # t-1, respectively.

Spectral Rolloff

This is a measure of where the vast majority of the audio information ends. It's a point (in frequency) in the spectrum where the sum of all the bins before it equals 85% of the sum of all the bins in the spectrum. In effect, it's an expression of where (about) the spectrum begins to fall off.

It uses the weighted bins for this. In order to find this frequency, one needs to solve for f in the following function:

  R              N
--- ---
\ \
\ M[f] >= c * \ M[f]
/ /
/ /
--- ---
f=1 f=1

where M[f] is the weighted amplitude of the FFT frequency bin corresponding to frequency f, N is equal to the total number of frequency bins and c is the "concentration" value (in this case 0.85) from 0 to 1, representing the target concentration for the rolloff point.

Zero Crossings

This is relatively simple. It's the number of times the clip's waveform crosses the zero-axis. This is an effective measure of the "noisiness" of the signal. It doesn't even use the Fourier transform.

Here's the characteristic function:


N
---
\
\ | sign(x[n]) - sign(x[n-1]) |
/
/
---
n
--------------------------------------
2 * H

where x[n] is the value of the sample at index n, N is the total number of samples in the signal, H is the number of samples per second of signal (the sampling rate), and sign() is a pseudo-function such that sign(x) = x / |x|.


Whoo! That's a lot to absorb in one sitting. Add to that the 96-output function described in the previous post and you'll see why this has taken me so long.

Next time: How to put all this crap together into something coherent.

Friday, February 6, 2009

Damn You, GSL!

One part of this program I wrote is a calculator of respective loudness of the keys of, as a default and in the context of this program, a standard 88-key piano. It does this by maintaining a matrix of Gaussian normal curves in memory, which look something like this:

A representation of just the middle C octave. Note that in the actual program, the full spectrum of octaves is represented.

Which it then multiplies by a 1-row matrix of the incoming amplitudes of the audio samples' frequency bin values from Math::FFTW.

This, in essence, works by taking the Y values of each key's curve and multiplying it the corresponding values in the FFT deconstruction, then summing the results.

Anyway, I'll get off of this digression and back to the problem at hand.

Any way you slice it, this whole process involves a LOT of matrix math between HUGE matrices, including constructing a lot of new matrices for a second, then multiplying them and returning the result.

At first I chose a simple and easy-to-use matrix math library, Math::MatrixReal. This was a little slow to begin with, plus I had to code in a workaround for creating the aforementioned huge matrices, but it worked... But it was S-L-O-W.

I decided to switch to a C library that would hopefully get the job done faster. I found, much to my delight, Math::GSL, which had a specific library for dealing with matrices at a very low level. It even had an extensive basic linear algebra system (BLAS) to do exactly what I had in mind, not to mention that it was the motherfucking Gnu Scientific Library I would be playing with. Yay!

After I switched, I noticed that when the program was processing audio samples, instead of using ~200 megs of memory, it was now using ~70 megs of memory.

All was not right, though. Now I was having a problem with my program. As it got further and further into processing the file, I noticed it took up more and more memory, eventually ballooning out to 2 gigs of memory once. This especially happened when I was processing multiple files at once.

Long story short, I figured out that it was the GSL matrices that were staying latent in memory long after the program's Math::GSL::Matrix objects were destroyed. Luckily I found a way to solve it:

use Math::GSL::Matrix qw/ :all /;

my $matrix = Math::GSL::Matrix->new($rows, $cols);

# ...

# clean up!
&gsl_matrix_free($matrix->raw);

Now that I solved this problem, it should purr like a kitten.

Thursday, February 5, 2009

And so it begins. After two years of sitting in the dusty corners of my hard drive, I have started—after being inspired by my newfound unemployment—to take up the old reins on programming my automatic musical genre classification program, Audicon.


While the vast majority of the reasons I originally gave up on it were due to forgetting the old axiom:


The most important early product on the way to developing a good product is an imperfect version.

In essence, I had too many ideas about how to improve the existing code, whether by exhaustively/obsessively searching for a better way to write a small portion of it, or, in the case of the entire project, getting fed up with the limited syntax of Scilab's programming language and opting to write it in a language I knew could do the job with style (i.e. Perl).


More details on now this thing works to come...