

A D V A N C E D M A T E R I A L S & P R O C E S S E S | A P R I L 2 0 1 6
2 6
exponentially. The adoption curve fol-
lows a path similar to Tuckman’s group
development model of
form-storm-
norm-perform
(Fig. 1). Early-stage in-
novators collaborate to form core con-
sensus and capability. Subsequently,
waves of adopters find it more difficult
(due in large part to the sheer size of
the group) to reach or obtain consensus
and might question earlier decisions (a
chaotic period—the storm). Then, de-
clining numbers of adopters (converted
skeptics) are more likely to accept the
work of earlier adopters as the norm.
After reaching this stage, efforts of the
community focus on performance.
As Web 1.0 demonstrated, the
key to success is establishing a flexible
framework that enables the entire com-
munity to adopt and grow the technol-
ogy organically over time through de-
velopment of symbiotic relationships.
Web 2.0 brought the advent of social
media along with community driven
content, transforming publishing into
an ongoing and interactive process of
micropublishing and tagging (folksono-
my)
[2]
. This accelerated the culture shift,
prompting the explosion of commerce
that dominates the World Wide Web to-
day—a symbiotic relationship between
publishers, retailers, and consumers.
The Semantic Web
[3]
, sometimes
referred to as Web 3.0, extends Web 2.0
through standards developed by the
World Wide Web Consortium (W3C).
These standards outline a flexible
framework for sharing data as opposed
to sharing documents, i.e., linking data
and tools across application, enter-
prise, and community boundaries. By
defining a flexible framework for the
interoperability of data and tools, W3C
Semantic Web standards enable large-
scale integration of, and reasoning over,
data and analytics on the web. This
framework sets the stage for a cultural
transformation of the way materials sci-
entists work.
INTEGRATED TOOLS
In the current state of technology,
applications query sets of predefined
data sources. Each data source has
its own schema and format, which re-
quires custom interfaces to connect
and extract content. When a new data
source is introduced or a data schema
changes, application logic (e.g., busi-
ness rules), presentation logic, and
interfaces must change. This brittle,
point-to-point architecture constrains
analytic and modeling capabilities, and
greatly increases IT costs. Further, que-
ries are written to specific databases.
Therefore, the entire set of available
databases and other content reposito-
ries is not considered, so although the
answer to a basic question may reside
in the “system,” the query will never
know.
To highlight the challenge of in-
tegrating tools over the web, a generic
approach of using application program-
ming interfaces (API) is used to make
structured data available on the web.
Web APIs provide simple query access
to structured data over the http proto-
col. High-profile examples of these APIs
include Amazon Product Advertising
API
[4]
and the Flickr API
[5]
. The advent of
web APIs led to an explosion in small,
specialized applications (or “mashups”)
that combine data from several sourc-
es, each of which is accessed through
an API specific to the data provider.
While the benefits of programmatic
access to structured data are indisput-
able, the existence of a specialized API
for each data set creates a landscape
where significant effort is required to
integrate each novel data set into an
application. That is, every programmer
must understand the methods avail-
able to retrieve data from each API, and
then write custom code to access data
from each data source.
“Materials research is emphasiz-
ing the need for integrated computa-
tional tools and high throughput exper-
iments, and at the same time, product
development is striving to more effec-
tively integrate materials, manufactur-
ing, and design,” explains Daniel Mira-
cle, senior scientist in the Materials and
Manufacturing Directorate at the Air
Force Research Laboratory. “Integra-
tion of heterogeneous, distributed data
is moving away from ‘wouldn’t it be
nice’ to ‘we have to do it.’” This is where
semantic technology comes in. Since it
was designed for machine-to-machine
interchange on the web and is based on
open, mature standards, it is a perfect
fit for this community.
Semantically linked data is
schemaless. Humans and software
applications correlate and interpret
information unambiguously, using
one standard. Linkage between multi-
ple datasets, files, and their respective
metadata is established without hav-
ing to adhere to specific database table
structures. As such, data changes to
not “break” applications. Conversely,
application changes do not break the
information fabric. Data operations are
greatly simplified, enabling scientists
to focus on critical research as opposed
to continual data extract, transform,
and load (ETL) functions. Linked data’s
graph structure is machine under-
standable, i.e., applications can infer
meaning in the data, making them
“smarter.”
Where are all these semantic appli-
cations? Because applications operate
on data, the question regresses to a
more fundamental one: Where is all the
linked data? The symbiotic relationship
required for global adoption of Web 3.0
begins with data, and that leads to the
final MGI challenge.
ACCESSIBLE DIGITAL DATA
Scientists generally agree that
sharing data is a worthwhile goal. The
Data Sharing Policy of the National Sci-
ence Foundation states, “Investigators
are expected to share with other re-
searchers, at no more than incremental
cost and within a reasonable time, the
primary data, samples, physical collec-
tions and other supporting materials
created or gathered in the course of
work under NSF grants
[6]
.”
The MGI suggests that to bene-
fit from broadly accessible materials
data, a culture of data sharing must
be supported by constructing a mod-
ern materials-data infrastructure that
includes the software, hardware, and