Bi-lateral
My day job is "Senior Programmer". That means I have at least 5 years of professional programming experience (14 actually). So, bascially, I know what I'm doing when it comes to programming. Sure, the young kids are going to have a better grasp of the nuances of C# or MSIL or whatever Microsoft has decided we all need to know this month. But programming isn't really about the programming language you use; programming is really about knowing what you need to do with whatever language you have to use, in order to satisfy your customers. And after a while, you hit a point where you realize they're not really asking for anything new, they're just asking for it in different ways or with slight variations on a common theme. And there are dozens of books on this - written well before I figured it out for myself. So, at one level, my day jobs have been all pretty much the same kind of thing.
I imagine it's something like house-building: the plans say a wall goes here, and there's a window here and it might have a strange angle for a dormer or something. But people who have a lot of experience building houses know how to build the kinds of walls that go into standard 2x4 and sheet-rock houses. And if a given plan mixes walls and ceilings in new and unusual ways, it's not a huge problem - it's just a matter of working it through using techniques you know like the back of your hand. Sure. maybe some new guy will show up with a tool that can cut wood like an Exacto knife through paper - but that doesn't tell him how to build a wall, and it can't teach him. That knowledge comes from practice. In many ways, like house building, programming is a craft.
My side job is also as a programmer. But I write image processing tools - a fun little niche, in my opinion. Of course I can do all the basics: read a JPG, rotate it, resize it, filter it, put text on it, etc.. But that kind of stuff has been understood for decades. I didn't invent any of it, I just learned from looking at what other people have done. Image processing is very technical at its foundations, but there has been so much research and development done on this stuff over the years that, at this point, a lot of it is like following a cookbook: need to know how to rotate an image? there are a million places to look for good ways to do it; so pick your algorithm and type it in. The only thing left for much of it is to make your implementation as fast as you possibly can. Because besides accuracy, image processing users want their image to rotate as fast as fucking possible, and not a millisecond slower. But, it's still pretty much cookbook stuff at heart. So, at that level, it's just like my day job - the standard problems have already been solved, it's up to you to adapt the well-known solutions to the task at hand. The fun for me is in the optimization (the speed improvements - an art form of its own).
But, unlike my typical day-job stuff (ask the database for the data, put it in a list, tell the buttons what to do, etc) there's also a cutting-edge to image processing. There's a pretty substantial number of real scientists working on new things all the time. They work in things like computer vision, 3D rendering, and morphological processing (the intersection of set theory and image processing) - esoteric stuff, by most standards. The things they come up with are useful, exciting and often almost magical. But they are also so far ahead of the mainstream that the professional programmer community hasn't had time to come up with ways of implementing them - think of a set of cutting edge architects who keep inventing buildings made out of exotic materials and using techniques that the average subdivision home builder couldn't possibly use and still get that job done on schedule and budget. It's nice to look at in drawings, but how the hell can the average crew build that kind of thing?
Still, my customers will read about this stuff in a magazine and ask me if I can do it; "This is cool! I need this now!" I hate to say "No", so I always try to see what I can do. And it's always fun to be able to find new things to work on. Most of the time, I can Google around enough to find recipe I can use, and that's that. But sometimes, the thing the customer has asked about is something that is so new and esoteric that the only information I can find are technical papers, submitted to conferences or academic journals by professors and grad students who approach the problem from a mathmatical or theoretical viewpoint, written for an audience of academics, scientists, and theoreticians. There is rarely a plain-English explanation of what they're doing; there's always a bunch of long horrible equations written in terse notation where every variable has multiple super- and sub-scripts, lots of summations, glossing over details and "... this is as explained by Xhiao Lung and Frederic Grimenschtrudel in their 1986 paper, Techinques for invariant monological comprendium derivatitions and tri-quadrant bi-noodling"; and there are always graphs comparing the results of their idea to some other academics' idea - whose work I don't understand either. Once in a while I find a paper written by a student of these professors who has implemented what the professor described, but only describes a high-level sumamry of results (a picture of the finished house - never a description of how they actually built it). Tease.
So, my latest attempt is to do something called "tone mapping". It's basically an attempt to automatically adjust the brightness of an image so that previously invisible details in light and dark areas are made visible without horribly distorting the overall brightness of the image - ex. given an image of a dark room with a bright window, it would bring out details in the shadows and details in the bright areas at the same time (and in a way that looks natural) but wouldn't affect the middle tones much. Try that in Photoshop sometime, to see how difficult it is using standard tools. Well, the heart of the trick lies in knowing what constitues a "detail", and the latest techniques for this rely heavily on something called a "bilateral filter". Roughly, this is a blurring filter that can recognize abrupt changes in image intensity - and if it sees a sharp change in intensity in a group of neighboring pixels, it assumes that it's looking at a detail and tones down the blur effect in that area. Incidentally, this reminds me of how automatic focus works in cameras: they look at a small part of the image (usually the center) and adjust the focus in and out in order to find the spot that maximizes the intensity differences between pixels - higher intensity difference = higher contrast = sharper focus. Squint your eyes, contrast goes down, image gets blurry.
Now, blurring an image happens to be a cookbook technique. It's very basic; a simple blur is one of the first things a budding image programmer is likely to learn. The typical bi-lateral filter is done with a "Gaussian" filter (a well-known cookbook filter) but with a twist. And that twist is the key to the whole thing. Now, I spent a week over Christmas playing with my Gaussian filter to try to improve its performance; I certainly understand how it works in practice (it's just a simple weighted average), but I'm not sure about the mathematical theory behind it (the weightings used are what makes it a "Gaussian", and I don't know why you need those particular weights or why they do what they do). And when the academic papers talk about modifying their Gaussian filters, they're doing it from the deeper mathematical viewpoint, which I don't understand, and can't seem to make heads nor tails of. And, of course, no cookbook has caught up to what they're talking about. So I suffer through these papers, hoping one will offer me a plain-English explanation - sometimes I never find it.
It's unfortunate for me that you need a post-grad degree in mathematics to understand the state-of-the-art in computer imaging. But, that's the way it is.
Too much typing for a Saturday night.
I imagine it's something like house-building: the plans say a wall goes here, and there's a window here and it might have a strange angle for a dormer or something. But people who have a lot of experience building houses know how to build the kinds of walls that go into standard 2x4 and sheet-rock houses. And if a given plan mixes walls and ceilings in new and unusual ways, it's not a huge problem - it's just a matter of working it through using techniques you know like the back of your hand. Sure. maybe some new guy will show up with a tool that can cut wood like an Exacto knife through paper - but that doesn't tell him how to build a wall, and it can't teach him. That knowledge comes from practice. In many ways, like house building, programming is a craft.
My side job is also as a programmer. But I write image processing tools - a fun little niche, in my opinion. Of course I can do all the basics: read a JPG, rotate it, resize it, filter it, put text on it, etc.. But that kind of stuff has been understood for decades. I didn't invent any of it, I just learned from looking at what other people have done. Image processing is very technical at its foundations, but there has been so much research and development done on this stuff over the years that, at this point, a lot of it is like following a cookbook: need to know how to rotate an image? there are a million places to look for good ways to do it; so pick your algorithm and type it in. The only thing left for much of it is to make your implementation as fast as you possibly can. Because besides accuracy, image processing users want their image to rotate as fast as fucking possible, and not a millisecond slower. But, it's still pretty much cookbook stuff at heart. So, at that level, it's just like my day job - the standard problems have already been solved, it's up to you to adapt the well-known solutions to the task at hand. The fun for me is in the optimization (the speed improvements - an art form of its own).
But, unlike my typical day-job stuff (ask the database for the data, put it in a list, tell the buttons what to do, etc) there's also a cutting-edge to image processing. There's a pretty substantial number of real scientists working on new things all the time. They work in things like computer vision, 3D rendering, and morphological processing (the intersection of set theory and image processing) - esoteric stuff, by most standards. The things they come up with are useful, exciting and often almost magical. But they are also so far ahead of the mainstream that the professional programmer community hasn't had time to come up with ways of implementing them - think of a set of cutting edge architects who keep inventing buildings made out of exotic materials and using techniques that the average subdivision home builder couldn't possibly use and still get that job done on schedule and budget. It's nice to look at in drawings, but how the hell can the average crew build that kind of thing?
Still, my customers will read about this stuff in a magazine and ask me if I can do it; "This is cool! I need this now!" I hate to say "No", so I always try to see what I can do. And it's always fun to be able to find new things to work on. Most of the time, I can Google around enough to find recipe I can use, and that's that. But sometimes, the thing the customer has asked about is something that is so new and esoteric that the only information I can find are technical papers, submitted to conferences or academic journals by professors and grad students who approach the problem from a mathmatical or theoretical viewpoint, written for an audience of academics, scientists, and theoreticians. There is rarely a plain-English explanation of what they're doing; there's always a bunch of long horrible equations written in terse notation where every variable has multiple super- and sub-scripts, lots of summations, glossing over details and "... this is as explained by Xhiao Lung and Frederic Grimenschtrudel in their 1986 paper, Techinques for invariant monological comprendium derivatitions and tri-quadrant bi-noodling"; and there are always graphs comparing the results of their idea to some other academics' idea - whose work I don't understand either. Once in a while I find a paper written by a student of these professors who has implemented what the professor described, but only describes a high-level sumamry of results (a picture of the finished house - never a description of how they actually built it). Tease.
So, my latest attempt is to do something called "tone mapping". It's basically an attempt to automatically adjust the brightness of an image so that previously invisible details in light and dark areas are made visible without horribly distorting the overall brightness of the image - ex. given an image of a dark room with a bright window, it would bring out details in the shadows and details in the bright areas at the same time (and in a way that looks natural) but wouldn't affect the middle tones much. Try that in Photoshop sometime, to see how difficult it is using standard tools. Well, the heart of the trick lies in knowing what constitues a "detail", and the latest techniques for this rely heavily on something called a "bilateral filter". Roughly, this is a blurring filter that can recognize abrupt changes in image intensity - and if it sees a sharp change in intensity in a group of neighboring pixels, it assumes that it's looking at a detail and tones down the blur effect in that area. Incidentally, this reminds me of how automatic focus works in cameras: they look at a small part of the image (usually the center) and adjust the focus in and out in order to find the spot that maximizes the intensity differences between pixels - higher intensity difference = higher contrast = sharper focus. Squint your eyes, contrast goes down, image gets blurry.
Now, blurring an image happens to be a cookbook technique. It's very basic; a simple blur is one of the first things a budding image programmer is likely to learn. The typical bi-lateral filter is done with a "Gaussian" filter (a well-known cookbook filter) but with a twist. And that twist is the key to the whole thing. Now, I spent a week over Christmas playing with my Gaussian filter to try to improve its performance; I certainly understand how it works in practice (it's just a simple weighted average), but I'm not sure about the mathematical theory behind it (the weightings used are what makes it a "Gaussian", and I don't know why you need those particular weights or why they do what they do). And when the academic papers talk about modifying their Gaussian filters, they're doing it from the deeper mathematical viewpoint, which I don't understand, and can't seem to make heads nor tails of. And, of course, no cookbook has caught up to what they're talking about. So I suffer through these papers, hoping one will offer me a plain-English explanation - sometimes I never find it.
It's unfortunate for me that you need a post-grad degree in mathematics to understand the state-of-the-art in computer imaging. But, that's the way it is.
Too much typing for a Saturday night.
<< Home