Skip to content

Engaging Everyone

One of the biggest difficulties in open web education is building your project in such a way that it engages everyone rather than only the group of technologically savvy people who already understand the value and values of the open web. That is why we built Wikiotics from the ground up around materials and contributions that anyone can make. If we can empower people to help each other, we will teach them about the power and importance of the open web as a natural part of their work, just as Wikipedia has done for millions of people around the world.

The basic materials of language instruction are things that anyone can make. If you think our Introductory English lesson would be more effective with pictures from London, or if you think it would work better for you if it used pictures of the people and activities in your personal surroundings, you can change them. That is true whether you are a professional web designer and photographer or a kid with a camera phone. Point. Shoot. Teach. It is that simple and it is the only way our lessons get built.

If you want to turn our Chinese lesson into a Mandarin or Cantonese one, you don’t need any special training or programming expertise, all you need are a dozen sentences of recorded audio. If you don’t speak either of those dialects, there are more than a billion people who could record them for you. Our goal is to make that kind of sharing as simple as possible so that not only can some of the Mandarin speakers in the community record audio for you, but you can easily record some English sentences for them in thanks.

The raw material of language instruction is easy to make, but before the open web, there was no easy way to gather enough of it together in one place to create a universal language resource, just as there was no way to build a universal encyclopedia. The open web is the only way to make communication and collaborative creation easy enough to build either of these projects. That is the lesson that millions have learned from Wikipedia and it is why using Wikipedia as an example will let you start a conversation about the open web with almost anyone, regardless of their level of technological expertise. If we succeed in empowering people to teach each other language, there will be millions more who understand this lesson and how see the “open web”, not as an abstract concept about free technological infrastructures but rather as a vital structure supporting the activities of their daily lives.

Crossposted with drumbeat.

The Drumbeat of education

As many of you may know, I’ve been working on a language education project for the last two years, ever since running into a wall with my own Chinese studies. That project is called Wikiotics, a combination of “wiki” and “semiotics“. So far we’ve spent our time building tools for creating interactive language lessons like this sample one for English.

The Grant

On Monday we applied for funding from the new Mozilla/Shuttleworth “Open Web fellowship” program to try and support the project through a year of community building. The goal is to show people the value of the open web by engaging them in a productive community activity, like Wikipedia’s encyclopedia collaboration, that can only happen on a free and open web. The focus of our community is language instruction; our focus is showing people in the language community how the open web empowers them to do things that would otherwise be impossible.

If you’ve ever been frustrated by the lack of free, high quality, language instruction material or wondered why tools like Rosetta Stone and Pimsleur can still charge hundreds of dollars for tiny amounts of language instruction inside interfaces that are less flexible than you average web page, check out our project. Our tools will allow the community to build rich, interactive language instruction materials, materials that are as easy to create, re-mix, and share as Wikipedia pages.

Getting Involved

We can always use more people and getting involved at this stage is really easy, just check out the project page and leave some comments. Wikiotics means a lot to me so I really appreciate the effort, even if it is just signing up.

If you want to do more, we’ve got a Flickr photo group where we’re collecting pictures for use in language lessons. If you have any CC licensed* pictures, join up and add away. Pictures with clear subjects are easiest to use for language instruction but anything you can imagine using is welcome. Think of them as picture flash cards for sentences like “the girls are walking” and you’ll get the idea. This picture is the best I could do from searching flickr’s current pictures, but I’m sure you can all do better with your cameras and some willing subjects.

We’re also in the midst of heavy technological development for our back-end software, a lovely new wiki called ductus, built from the ground up to handle this kind of rich interactive content. If anyone is interested in python, django, and the possibilities of git-based wiki development, check it out.

*CC-BY or CC-BY-SA specifically

Crossposted with drumbeat.

The Census is Private

Last night a local census taker came to my door and asked me a number of personal questions. As anyone reading this likely knows, I care deeply about my privacy, but I was happy to fill out the census. This might seem counter-intuitive, especially given all the apparent controversy over giving personal information to the government, so let me explain.

Initially, I was reluctant to participate as well, but some of the census advertising, and a little independent research, convinced me it was a good idea. Ironically, the advertising convinced me to participate not by explaining how necessary the census is but by highlighting it’s uselessness.

The ads that struck me are from the subway and follow this pattern: How will we know how many ______ to provide is we don’t know how many people there are? Where the blank can be anything from “hospital beds” to “teachers” to “trains”. It is a sensible plea highlighting the relationship between having reliable information about the beneficiaries of government services and the effective administration of those services. Unfortunately it is also obviously outdated.

Do we actually rely on the census figures, taken once every ten years, to plan out how many trains to run or how many hospital beds we need? I certainly hope not. Operating a transit system or hospital in the 21st century involves collecting records more detailed than the census as a daily part of functioning. You simply cannot manage a train schedule or service changes without accurate knowledge of how many people use what trains at what times, nor can you manage hospital scheduling and inventory without knowing how many people needed what medical resources on each day of your management cycle.

The administration of government services does not depend on the information collected by the census, it produces far more accurate and detailed records than the census is set up to collect. If you were worried about the government having information about your private life, don’t worry about the census. Take some of that energy and consider what the government learns about you every time you use a metrocard or pass a toll booth with your ez-pass, or when all our medical records are digitized and centralized. If you believe that not filling out the census will blind the government to the private details of your life, you need to take a better look at the details they already have.

The census is not about spying on you, it is about enfranchising you. The only government service that is apportioned by the census is representation in the national government, and it is the one that determines how much weight all of your concerns and needs for other services have for the next ten years. So I was glad to be counted and encourage anyone else who has avoided the census thus far to stand and be counted as well.

Hopefully, next time around we can dispense with the ritual paperwork and use the information we already have to, more accurately, estimate population, automatically adding millions of the poorest and most vulnerable members of our community to the count. Like most efforts to enfranchise the poor and vulnerable, it is going to be an uphill struggle.

Bittorrent and Miro, a better Distributed Proofreading

If you spend some time in the ebook community you inevitably run into Distributed Proofreading, the collaborative proofreading group that supplies Project Gutenberg with high quality text versions of Public Domain books. They are a small community of dedicated editors doing good work. Unfortunately, they are also becoming irrelevant to most of the issues in the field because their multi-layer workflow is simply too slow. When organizations like Google are releasing a million books at once, it is hard to stay relevant when struggling to complete your project’s 20,000 book, even if those books, unlike Google’s, are meticulously verified and formatted. Scale and quality both matter and, if we structure it right, we can rework our communal digitization projects to get both.

Currently, Distributed Proofreaders only releases books after spending weeks or months verifying that the text version matches the original page images. The industrial scanning efforts like Google Books and the Million Books Project generally skip verification entirely and distribute raw text versions with the photographic page images. This is perhaps the greatest key to their large size. Yes, they also paid for large scale scanning but scanning is easy compared to proofreading, and getting getting easier all the time. You can be sure that Google’s library would not be half so large if they had to pay for the kind of quality that Distributed Proofreaders provides. Unfortunately, if the price of this quality is only having thousands rather than millions of books, it is too high to continue paying.

I propose a middle road between the raw image release and the meticulous text one. What if we distributed raw image and unverified text files from day one, but build our distribution network to enable everyone downloading a copy to upload corrections and share those corrections automatically with everyone else who has a copy? If we did that we could gain speed and scale while also building our community of contributers.

Technologically, bittorrent and a rich client like miro would get us most of the way there. We would make each book into a miro channel that people would subscribe to when downloading the book. Once downloaded we would need a book reading view that we could optimize for whatever common reader actions relate to proofreading. Things like spell check and revealing the text around a section to verify academic citations spring immediately to mind. The key is that corrections should come primarily from people’s normal interactions with the books they are interested in, no altruism or active volunteering necessary. Once people have corrected their local copies, the client sends those corrections back to the central server where they can be sent out via rss to everyone subscribed to that book’s channel.

As far as the user is concerned, she simply downloads the books she is interested in with her miro-based library manager and either fixes errors as they bother her, or leaves them alone and watches the text gradually correct itself as other people interested in the same books notice and correct errors. If the errors are really frustrating, she can always fall back to reading the page images and be no worse off than if reading on Google Books or any other large page image-based digital library.

As far as the community is concerned, we get a larger pool of potential contributers because now everyone with a copy can contribute back, and people are able to contribute by sharing spare hard drive space and unused bandwidth rather than having to donate funds to pay for central hosting and distribution. There are plenty of people in the community who have no time or inclination to proofread but would gladly download some book images and leave a torrent running in the background to help share the files more widely.

Making it easier to contribute increases the effectiveness of the project as a whole by helping make sure that all the people who care about a book have the opportunity to put their time into preserving that book. The more people care, the more work gets done. In two years of talking with people about my own book digitization projects, I have grown to have a healthy respect for how much people care about their own books and about preserving them, in whatever form.

In the end, there are only two scalable digitization strategies: teach computers to read, or harness the passion people have for their books for the benefit of us all. A handful of highly organized editors like the Distributed Proofreaders community will always have it’s place, but they cannot handle the scale of this project alone. We should make sure they have some help.

(Crossposted with bookliberator)

Guruplug Server

My new Guruplug, the second generation of that plug computer Eben and I keep talking about, just made it to me.

Here are the two of them side by side. The guru plug is the smaller black one.

The Guruplug is an upgrade from the original Sheevaplug development kit in pretty much all respects. It has a more powerful processor, a much expanded array of ports, making it more capable, and is smaller in all dimensions.

The only thing that new Guruplug owners might miss from the older model is the full size SD card slot on the side. Given that this has been replaced by: 1) a microsd slot, 2) an additional USB port, which could easily take a SD card inside of a SB adapter, and 3) an eSATA (Gb/s) cable for connecting external hard drives at their native speeds, I don’t think many people will actually miss the old SD slot.

Now Martin just needs to get his so he can teach the Debian installer about the new hardware, and we’ll be able to put it through some paces.

I’ll be spending the meantime reading up on how to shape local network traffic so I can replace my router with my new Guruplug.