Apr_AMP_Digital

A D V A N C E D M A T E R I A L S & P R O C E S S E S | A P R I L 2 0 1 6

2 6

exponentially. The adoption curve fol-

lows a path similar to Tuckman’s group

development model of

form-storm-

norm-perform

(Fig. 1). Early-stage in-

novators collaborate to form core con-

sensus and capability. Subsequently,

waves of adopters find it more difficult

(due in large part to the sheer size of

the group) to reach or obtain consensus

and might question earlier decisions (a

chaotic period—the storm). Then, de-

clining numbers of adopters (converted

skeptics) are more likely to accept the

work of earlier adopters as the norm.

After reaching this stage, efforts of the

community focus on performance.

As Web 1.0 demonstrated, the

key to success is establishing a flexible

framework that enables the entire com-

munity to adopt and grow the technol-

ogy organically over time through de-

velopment of symbiotic relationships.

Web 2.0 brought the advent of social

media along with community driven

content, transforming publishing into

an ongoing and interactive process of

micropublishing and tagging (folksono-

my)

[2]

. This accelerated the culture shift,

prompting the explosion of commerce

that dominates the World Wide Web to-

day—a symbiotic relationship between

publishers, retailers, and consumers.

The Semantic Web

[3]

, sometimes

referred to as Web 3.0, extends Web 2.0

through standards developed by the

World Wide Web Consortium (W3C).

These standards outline a flexible

framework for sharing data as opposed

to sharing documents, i.e., linking data

and tools across application, enter-

prise, and community boundaries. By

defining a flexible framework for the

interoperability of data and tools, W3C

Semantic Web standards enable large-

scale integration of, and reasoning over,

data and analytics on the web. This

framework sets the stage for a cultural

transformation of the way materials sci-

entists work.

INTEGRATED TOOLS

In the current state of technology,

applications query sets of predefined

data sources. Each data source has

its own schema and format, which re-

quires custom interfaces to connect

and extract content. When a new data

source is introduced or a data schema

changes, application logic (e.g., busi-

ness rules), presentation logic, and

interfaces must change. This brittle,

point-to-point architecture constrains

analytic and modeling capabilities, and

greatly increases IT costs. Further, que-

ries are written to specific databases.

Therefore, the entire set of available

databases and other content reposito-

ries is not considered, so although the

answer to a basic question may reside

in the “system,” the query will never

know.

To highlight the challenge of in-

tegrating tools over the web, a generic

approach of using application program-

ming interfaces (API) is used to make

structured data available on the web.

Web APIs provide simple query access

to structured data over the http proto-

col. High-profile examples of these APIs

include Amazon Product Advertising

API

[4]

and the Flickr API

[5]

. The advent of

web APIs led to an explosion in small,

specialized applications (or “mashups”)

that combine data from several sourc-

es, each of which is accessed through

an API specific to the data provider.

While the benefits of programmatic

access to structured data are indisput-

able, the existence of a specialized API

for each data set creates a landscape

where significant effort is required to

integrate each novel data set into an

application. That is, every programmer

must understand the methods avail-

able to retrieve data from each API, and

then write custom code to access data

from each data source.

“Materials research is emphasiz-

ing the need for integrated computa-

tional tools and high throughput exper-

iments, and at the same time, product

development is striving to more effec-

tively integrate materials, manufactur-

ing, and design,” explains Daniel Mira-

cle, senior scientist in the Materials and

Manufacturing Directorate at the Air

Force Research Laboratory. “Integra-

tion of heterogeneous, distributed data

is moving away from ‘wouldn’t it be

nice’ to ‘we have to do it.’” This is where

semantic technology comes in. Since it

was designed for machine-to-machine

interchange on the web and is based on

open, mature standards, it is a perfect

fit for this community.

Semantically linked data is

schemaless. Humans and software

applications correlate and interpret

information unambiguously, using

one standard. Linkage between multi-

ple datasets, files, and their respective

metadata is established without hav-

ing to adhere to specific database table

structures. As such, data changes to

not “break” applications. Conversely,

application changes do not break the

information fabric. Data operations are

greatly simplified, enabling scientists

to focus on critical research as opposed

to continual data extract, transform,

and load (ETL) functions. Linked data’s

graph structure is machine under-

standable, i.e., applications can infer

meaning in the data, making them

“smarter.”

Where are all these semantic appli-

cations? Because applications operate

on data, the question regresses to a

more fundamental one: Where is all the

linked data? The symbiotic relationship

required for global adoption of Web 3.0

begins with data, and that leads to the

final MGI challenge.

ACCESSIBLE DIGITAL DATA

Scientists generally agree that

sharing data is a worthwhile goal. The

Data Sharing Policy of the National Sci-

ence Foundation states, “Investigators

are expected to share with other re-

searchers, at no more than incremental

cost and within a reasonable time, the

primary data, samples, physical collec-

tions and other supporting materials

created or gathered in the course of

work under NSF grants

[6]

.”

The MGI suggests that to bene-

fit from broadly accessible materials

data, a culture of data sharing must

be supported by constructing a mod-

ern materials-data infrastructure that

includes the software, hardware, and