The Real Semantic Web
My understanding of the Semantic Web is that we are planning to encode web pages so that the data contained in the pages will be understood by computers. The Semantic Web is being touted as the next version of the Internet: Web 3.0, if you will.
If you’re not familiar with the Semantic Web, let me take a moment to provide some background. We create web pages everyday. Those pages contain data. For example, I may write a blog post about an upcoming party: “I’m having a party at Chucky Cheese on Saturday at 10 a.m.” Data can be gleaned from that post; Chucky Cheese is a location and Saturday at 10 a.m. is a date/time value. A human can easily pick out the data elements and (if they’re invited) will know where to go and when to show up. Today’s computers can’t even begin to understand that information unless you place clues around those pieces of data. Much like HTML, you would need to use codes to tell a computer, “This is the location of the party, and this is the date and time.”
Example:
I’m having a <event>party</event> at <location geo=”773838383,3838282″>Chucky Cheese</location> on <start-time “2008-03-29 10:00″>Saturday at 10 a.m.</start-time>”
In my party example, the encoding could result in computers having the ability to pick the information off of a web page and do things like insert the event right into your calendar, or copy the location coordinates to your GPS system so that you’ll know exactly where to drive.
I think this is the “pure” view of the Semantic Web. I’ve seen many people refer to additional integration and API work as being semantic, where web sites talk with one another. While some of those approaches may use semantic tools, we’re still just talking about computers talking to computers in those cases. That’s nothing new.
So my big question regarding the Semantic Web is, “Who will markup their web pages like my example above?” My response to the rhetorical question is, “No one except geeks and niche groups.” All of our current coding techniques, whether they be RDF or Microformats are all going to fall short when it comes to the general populace. Why? Because your average Internet publisher is never going to take the time to encode the data, and if they try, the chances are very high that it will be done incorrectly.
“But our tools will do the encoding for us,” you say. Okay, here’s my big point: If you have tools that can read your text, understand where to place the codes, and encode your data for you, then we don’t even need the codes anymore.
The Semantic Web will happen once computer programs have been taught to read web pages just like a human. If you think it can’t be done, check out efforts like PimpMyNews.com. Granted, we’ve had text-to-voice for a long time, but we’ve made significant strides in how these programs understand language. I believe our programs will make the leap to understanding the data behind the language.
Widespread encoding will never happen. Encoding can be incredibly useful for specific applications, but it’s a temporary fix, not the long-term solution. Programs of tomorrow will learn to use all of the visual cues in a web page and even understand context, based on stored experience. That is the secret to the Semantic Web. My advice for hard-core researchers: leave the codes behind and work on the real solution. The Semantic Web will happen, but not until we teach computers to read our data the way we write our data.




Jason Ryan said,
Wrote on March 31, 2008 @ 12:42 pm
I think you are pretty close to the money here Shannon, but there is room for a couple of other points.
There is already a class of web apps producing semantic data: those that use controlled values (or controlled vocabulary) in their entry fields (flickr and Upcoming spring to mind). So there is a steadily increasing amount of structured data out there…
The other point I would make is that for some publishers, government is one, publishing structured data is not a choice, but a necessity, for economic and social reasons.
I would like to be able to wait until machines can read our data, I just don’t think that is a good enough argument for not acting now. And, looking at the apps that you build, I suspect you might feel the same way.
swhitley said,
Wrote on March 31, 2008 @ 1:08 pm
You’re absolutely right, Jason. I’d just hope the true computer scientists are investing their time wisely. (Some, I fear, are not.) Leave the current wave of semantic data to grunts like me. I want them to focus far beyond where we are today.
John Atkinson said,
Wrote on March 31, 2008 @ 1:41 pm
Good thoughts. I agree that we can’t expect “regular people” to change their behavior to learn to write to a “format” computers can understand, and that the focus should be on making tech adapt to the way we write. (to ID which words are “codes” based on the context- i.e. surrounding words).
I feel it can be done, based on working with text-to-speech technology, which determines pronunciation based on context(similar principle, different output)
I’m not smart enough to figure it out, but others are -and they’re probably already working on it.
Quote: The Real Semantic Web « if you sit on the fence you *will* fall in said,
Wrote on March 31, 2008 @ 2:16 pm
[...] Nice post & comment via @jasonwryan on the semantic web. [...]
Joe Moraca said,
Wrote on March 31, 2008 @ 5:09 pm
Great post …. good that you are “keeping it real” and not floating on hype. I quoted you at webdevgeeks.com
swhitley said,
Wrote on April 1, 2008 @ 9:54 am
John, ceej75, and Joe,
Thanks for the comments and links. I love this topic. It provides so much fodder for really deep discussions about our future.
Jason Ryan said,
Wrote on April 1, 2008 @ 7:44 pm
Another example of interesting work in the public sector, Seb Chan at the Powerhouse Museum using Open Calais to index and tag their huge collection:
Bill Hood said,
Wrote on December 17, 2008 @ 7:54 am
Shannon, very interesting idea. However, you must know that the Macintosh operating system has been doing a part of this for many years. Whenever, I get an email about a party with a date in the email, there is a link that when scrolled over that allows me to retrieve the date and place an event on my calendar. It picks up the date and transfers it to the correct date on the calendar and then places the name of the event in the subject line.
The Mac also does this with email addresses, places, and other pertinent information. While it is not as sophisticated as what you propose, it is getting there. I am not a programmer and do not understand the coding that is present, but I know it works well. It can read the date in most formats, i.e. December 17, 2008 or even 120808, 12082008, 08122008, etc.
Likewise, I use the Answers Plug In on Firefox to highlight a place name on the Internet and click on the CTRL key to get as much information on a place as I need. If the place is Anaheim, I get coordinates, weather, map, population and so much more.
I can do the same on a person’s name.
I believe that the Internet is much closer to what you describe than one might think. Granted it is not encoded as you suggest, but still, there are Plug Ins that are working to make this happen.
I have another friend in Dallas, Texas, Daniel Miller that has been working on Social Software that is much closer to linking the people of the world than anyone might imagine.
From his website at http://sugarfilled.com/daniel: Daniel Miller is a web producer with 10 years experience in information technology, eight of which have been spent exclusively in web design, development and content management. He has worked with everyone from mom-and-pop shops on the Florida coast to multinational technology companies in Sydney and Tel-Aviv.
I can’t wait until tomorrow happens on the Internet!
Bill Hood
Bill Hood Consulting
swhitley said,
Wrote on December 22, 2008 @ 11:13 am
Thanks for the comment, Bill. Yes, I’m looking forward to the day when those Mac features are pervasive.