Big Data On A Crowded Train

When a guy drags his overcoat across your sandwich, a crowded train is usually a bit overwhelming. However, I was recently riding a packed Acela from a customer’s site in Boston and had a good chat with my seatmate, who worked in a non-profit organization. Her job was to guide first-generation college-students through an unfamiliar landscape of academics and financial aid. “It’s hard for some of the families,” she said, “who might be supportive of their kid’s aspirations, but who don’t have the experience with the system.” I’ve got two kids in school myself and though my family’s sent a couple of generations through college, I still find it tough to help my kids keep up with their coursework and deal with all the bureaucracy. “Two kids at once!” people sympathize. Although maybe they’re just talking about the tuition.

“What’s your caseload?” I asked my seat mate, figuring it must be five or six. Her answer shocked me and revealed something fundamental about the role of big data in education.

Her caseload is 40 kids. She helps 40 kids navigate through college. Suddenly my travails with two did not seem so difficult.

“Yes,” she said, “and I have a dozen colleagues each with 40 kids too, across hundreds of colleges.”

“How do you keep track of them all?” I said.

“I don’t know,” she said. “We put them in a spreadsheet.”

“Ah,” I said. “The old pivot table.”

“I guess,” she said.

In a recent post I talked about building a lightweight Learning Management System using the MarkLogic architecture you’ve deployed to support your other business objectives. A suite of apps provides the storage, search and assembly you need for a viable learning experience. When I wrote that post, I focused on the educational content, the facts you might extract from the articles, books, or other items you already publish.

But what I realized on this train is that big data in education is not only about educational material — narratives, tables, equations, and figures. Big data in education is also about students. Maybe even primarily about students — lots and lots of them. What if you, as an educator, could find groups of similar students among thousands or millions, use information about them to assemble compelling curricula, and discover learning trends in a single student’s history so you can suggest her next step?

There is a new technology approach to testing and course progress called TinCan. TinCan is described as an “experience-based api” because it defines how a learning system can store a student’s lifelong learning activities. It expects to receive statements like “Frank completed exercise 5 on chapter three of The Essentials of Interaction Design.” For many people who’ve been involved with the Semantic Web, this statement is familiar as a subject-predicate-object pattern. It can therefore be expressed as a triple and stored in MarkLogic 7’s new triples index. So let’s do that. Let’s build an implementation of TinCan on MarkLogic. While we’re at it, we’ll use MarkLogic’s Java or REST api to store that experience triple. In fact, we’ll store all the experiences of all the students my train companion’s organization helps. Now she can use a visualization tool like d3 to see that many of the experiences in their TinCan streams are related. Some of them perhaps form a cluster around the concept of interaction design. Further, she can see that interaction design is related to user interfaces, and that a scholarship competition for user interfaces closes in sixty days. Since we’re me, we’ve implemented this on MarkLogic, and MarkLogic’s capabilities have helped my seat mate provide better service to her students.

What do you need to know about the people you service? Can you query for it now, on your existing system? Can you improve your students’ day by storing their learning experiences in MarkLogic?