Tuesday, March 31, 2015

Old library cards - making data bases searchable

One of my jobs is to organize. I organize processes, data, meetings, people, documents ... list goes on and one. Actually, I think one of the better words to use is to say that I "categorize" or "index". I thought about this the other day when someone asked about the details for a specific project.

"It's like old-school library cards" 

The blank stare I got back was priceless. I know, I know, I am OLD* I then proceeded to explain how I remember the old "cross reference index cards [kartotek] that you had to use when going to the library to find books. The big, long boxes that pulled out and you could flip through them intensely searching for the one card that had the book you were looking for. And then stepping over to the correct shelf looking for where the book should be placed".

I mean, it's like a computer database - just more literal. The main point I was trying to drive home? That if you 'tag'/index something incorrectly, you will never be able to find it. This might be even more obvious now with 'smart' computers. Since computers aren't smart. They only look for exactly what you tell them to look for, and if someone has accidentally mis-spelled the index word.... well, you're out of luck.

Hence the second portion of my conversation about details that got the very brilliant scientists and physicians cringing and wanting not to talk so much about "locking things down" and "being detailed" since "maybe we could just have 'open fields' where everyone could write their own words".

Yeah. Not so much if you're asking me to build the system for you.

Why? Simple. If you let people write "what ever they they feel is correct in the field" - how on earth are you going to find it later? It's fascinating to me how many people use this excuse as an escape to not have the discussions about "what is important" and "who decides what we call x". I usually explain that in the best scenario you have two choices - yes and no. (Think about a freezer box with 81 openings for eppendorf tubes. Either there is a tube in the slot, or there isn't)

Or a field that has to be filled in for the log function to work, like "today's date". Or CAS number, if you are a chemist. Or a patient MRN for patient samples.

Those are, for obvious reasons, not so difficult to get along around. It's the more fluffy things that make it hard. Don't get me wrong, I'm not saying "streamlining" is easy. All I'm saying is that it's imperative that you decide "most cases fall into these categories" and then leave one little option as "optional". The language equivalent of comparing between French and German. The latter has a lot of very defined groups where everything is exactly clear and then a few odd ones that do what ever they want and you need another type of subcategory to even know what you're looking for. The former, well - lots of exceptions and special cases, also depending on context... not an easy index card system.

My solution for many of these discussions? Suggesting, gently of course, that we use "choice" menus. How we get there? By doing a couple of template and examples (always so much easier to "show and tell" why someones system has a few issues) and then have a discussion (feedback loop) again about "which fields are crucial for the whole system to work" and then go from there. Sounds very easy, I type this and feel that it's obvious. However, it's fascinating to see the faces of people when they realize the difference between using prefixed choice "preclinical" instead of "pre-clinical" since the program will make a difference and then not find all of the studies they were looking for.

TLDR: cataloguing is something librarians know. Befriend a librarian and get help with your database. And/or get a friendly scientist turned organizer to help you distinguish between what's crucial and what's ideal.

Here is a great blog and posts about this and more, very eloquently written!

*OLD - meaning "formative years without computers or cell phones, no Internet" i.e. born before 1985-87?

No comments: