Monday, November 22, 2010

Where Does Scientific Data Go to Die?

John Timmer has a fantastic series of articles going over at Ars Technica. It has really drawn me back into the thoughts that initially triggered my decision to start TelemetryWeb.

I don't want to regurgitate all of the information in his articles, but the gist of it is that there are no good solutions for capturing, storing, and archiving scientific data. It isn't hard to imagine the massive amounts of data that have been lost on floppy diskettes that got stashed in some research professor's desk. And even if you had the disk, do you have the complete technology stack required to read it? You'd need the disk, the correct drive, the right kind of computer, and a program to read the bits.

Switching gears a bit (but not really all that much, as you'll see), initiatives like are really cool, because they encourage scientists to make their data available online. serves as a directory for scientific data sets. Want to download raw data about the migrations of Canadian Geese? You might find a link to it there.

But that's part of the problem, too: All you'll get is a link. It is up to the research project to find a place to put the data online, and maintain it for all eternity. How often do researchers get grants to keep their data online? A friend who works at the University of Minnesota School of Agriculture says that doesn't happen very often. In fact, one of her recent projects had a five-year plan. The first four years were the bulk of the research, and the fifth year was building a system to get the findings online. Funding was suddenly dropped after the fourth year. So a publicly-funded institution spent four years doing some really useful research, which could help farmers save millions of dollars and reduce the amount of chemicals they use to combat disease. But all that research is sitting in a drawer somewhere. Unused.

But let's say that you found something on that is actually available. Great! What then? Do you understand the format of the data? Do you need a proprietary software package to read it? Is there any information about how the data was collected? What instruments or techniques were used? Is the data applicable to the work you are trying to do? What are the error factors and quality metrics? Alas, doesn't address those issues.

TelemetryWeb has thus far been focused on commercial applications simply because lots of smart people have told me that there's no viable business model in the scientific research community. They may be correct, but I'd love to have an opportunity to prove them wrong. But in either case, I'd love to see TelemetryWeb used to support scientific research. I've always been a bit of a science nerd, and started out as a physics major. It is simply a personal interest of mine, and it would make me feel good.

But the thing is, I really don't see the problems of the scientific community as being significantly different from the commercial problems that TelemetryWeb is trying to solve, anyway. Long-term data warehousing, good meta-data catalogs, owner control over data sharing or publication, and the ability to collaborate across geographical and organizational boundaries are all challenges that I've personally faced in my work developing commercial applications.

There are certainly a lot of scientific applications where TelemetryWeb won't always be a good fit, at least as it is designed currently. But I've already spoken with several people about scientific research projects, and I'd be happy to chat with more people on the subject.

No comments:

Post a Comment