US Ignite Tutorial on CloudLab.US

I’m reviewing the US Ignite Tutorial on CloudLab.US, featuring an OpenStack Juno on Ubuntu 14.10 instance with a controller, network manager, and one compute node. This profile runs on either x86 or ARM64 nodes. It takes advantage of Vanilla Apache Hadoop, Hortonworks Data Platform, Apache Spark, etc.

CloudLab is a leading-edge laboratory for exploring and applying new computer cloud architectures at scale. The CloudLab infrastructure consists of three new clusters at U. Wisconsin, Clemson U., and U. Utah augmenting the existing Emulab and GENI (Global Environment for Network Innovations) distributed computing facilities. Each of the new clusters is aimed at providing hardware support for a different point in the cloud design space, and together they represent an extraordinary flexibility for computer scientists to try new ideas and for domain scientists to match CloudLab infrastructure to their applications.

CloudLab is a project of the University of Utah, Clemson University, the University of Wisconsin Madison, the University of Massachusetts Amherst, Raytheon BBN Technologies, and US Ignite. CloudLab is part of the National Science Foundation’s NSFCloud program. To design and build the CloudLab facility, we’re partnering with three vendors: Cisco, Dell, and HP. Seagate has also provided a generous donation of hard drives.

Another similar project is Chameleon. Some videos here.

UPDATE: see also this additional recently announced project: Cornell to Lead NSF-Funded Cloud Federation for Big Data Analysis By David Raths 11/04/15

Tennessee Big Data Science

Tackling Big Genomics Data

Data Logistics Toolkit (DLT) [Indiana University; University of Tennessee, Knoxville; Vanderbilt University] @NSF Grant Number OCI‐1246282

Two UT Researchers Win Fifth Annual IDEA Awards

2010 Internet2 Driving Exemplary Applications (IDEA) Award for their work in developing a network storage infrastructure that will aid the nation’s researchers and educators in transferring large amounts of data and research quickly and easily for collaboration.

REDDnet: Enabling Data Intensive Science in the Wide Area


Update: (ht Ben White via Facebook)
40 maps that explain the internet

This Day in Health Science

On the one hand Decline in Research and Development :: Drug Safety Executive Blog

Dawn Van Dam
General Manager
Cambridge Healthtech Associates™
T: 781-707-8289

and on the other:

The Center for Integration of Research on Genetics and Ethics (CIRGE) presents our next Journal Club of the
2013-2014 Academic Year
Ethical Developments in the FDA’s Halt of 23andMe,Inc.
Dr. Marsha Michie PhD, CIRGE Post-Doctoral Fellow
When: Wednesday, January 29th from 1pm – 2pm *NOTE SPECIAL TIME*
Where: SCBE Conference Room
(Directions HERE)

Additionally, this discussion will be led by CIRGE Co-PI and Stanford Law School Professor Hank Greely and SCBE Senior Research Scholar Sandra Lee!

There are three readings assigned for this journal club, and they can be found here!

PLEASE LIKE The Center for Integration of Research on Genetics and Ethics and the Stanford Center for Biomedical Ethics on Facebook!

For more information on Journal Clubs and to access the readings for this and all Journal Clubs, please visit:

Please join us, and forward widely!

Colleen M. Berryessa
Program Manager, Center for Integration of Research on Genetics and Ethics (CIRGE)
Stanford University Center for Biomedical Ethics
1215A Welch Road, Room 71
Stanford, CA 94305-5417
W: (650) 736-0954
F: (650) 723-6131

Meanwhile, over at caGrid and the National Cancer Informatics Program.

David Foster, CERN: Implementation of a European e-Infrastructure for the 21st Century

Please find below a pointer to a recently published paper entitled “Implementation of a European e-Infrastructure for the 21st Century”.

The objective of the implementation plan is to put in place the e-infrastructure commons that will enable digital science by introducing IT as a service to the public research sector in Europe.

The rationale calls for a hybrid model that brings together public and commercial service suppliers to build a network of Centres of Excellence offering a range of services to a wide user base. The platform will make use of and cooperate with existing European e-infrastructures by jointly offering integrated services to the end-user. This hybrid model represents a significant change from the status-quo and will bring benefits for the stakeholders: end-users, research organisations, service providers (public and commercial) and funding agencies.

Centres of Excellence can be owned and operated by a mixture of commercial companies and public organisations. Their portfolio of services, starting with those listed by eIRG and the High Level Expert Group on Scientific Data, will be made available under a set of terms and conditions that are compliant with European jurisdiction and legislation with service definitions implementing recognised policies for trust, security and privacy notably for data protection. A funding model engaging all stakeholder groups is described. The ability to fully exploit the potential for knowledge and job creation that is locked-up in the datasets and algorithms to be hosted by the Centres of Excellence will require the nurturing of a new generation of data scientists with a core set of ICT skills.

A management board where all the Centres of Excellence operating organisations are represented will provide strategic and financial oversight is coupled with a user forum, through which the end-users themselves, in a cross-disciplinary body collaborate to define requirements and policies for the services. A pilot service is proposed that can be rapidly established by building on the existing investments. The pilot service will demonstrate the feasibility of the e-infrastructure Centres of Excellence model for a range of scientific disciplines and evaluate the suitability for the ESFRI Research Infrastructures, that are currently under-development and represent Europe’s future “big data factories”. Implementation will start in 2014, initially offering a limited set of services at a prototype Centre of Excellence.

This is the third in a series of documents by the EIROforum IT working group on the future of e-infrastructures. The documents from the EIROforum IT working group are seen as starting-point for an inclusive activity that will bring together a number of e-infrastructures into a public-private ecosystem where the value of the whole is far greater than an individual component.

We welcome your feedback on these documents, to improve the concept, its relevance and implementation. The intention is to revise the documents based on the feedback received.

You can provide feedback for the documents via the open access repository (you will need to login with a CERN or EGI account, alternatively you can create a CERN lightweight account:

A Vision for a European e-Infrastructure for the 21st Century:


Science, Strategy and Sustainable Solutions, a Collaboration on the Directions of E-Infrastructure for Science:


Implementation of a European e-Infrastructure for the 21st Century:


In addition an email list has been created, to which you can subscribe and pose questions of a more general nature (to subscribe simply send an email request to the list).

Activities will now focus on putting in place the structures defined in these documents, notably the User Forum, for which Jamie Shiers will be leading the preparation for its first meeting. The prototype e-infrastructure Centre of Excellence described in the implementation plan is being established by the IT department at CERN and a dedicated email list for subscription and questions has been created: (to subscribe simply send an email request to the list).

David Foster, CERN

Data News Speed

The DIGITAL DIVIDE isn’t about have and have nots anymore; it’s about HAVE IN TIME TO BE ACTIONABLE. Monopolies, oligarchies, and boutique data aggregators own the actual currency upon which societies are based:

Chattanooga media jostle in digital fast lane

The citywide rollout of a gigabit Ethernet connection and a brand new, job-proliferating Volkswagen plant have sparked an unusually high level of online competition in the nation’s 86th DMA. Wehco Media’s Times-Free Press leads in traffic while a bevy of rivals, including a strong Internet pureplay, are vying for attention.

High-Frequency Traders Flat-Out Buying Data Ahead of You

When the Institute for Supply Management releases its index of manufacturing activity next week, the headlines from the report will flash to traders at what their eyes tell them is 10:00 am. But unless they are subscribers to a new low-latency feed provided by Thomson Reuters, they’ll actually be getting it late—and depending on how they’re positioned, it could be too late.

04/24/2012 Internet2 General Session focused on transformation PDF

04/25/2012 Internet2 General Session focused on innovation PDF

Smart Grid Requires Utilities To Merge IT And OT Worlds


A: Because You Don’t ROWE Q: Why Can’t Tennessee Innovate? [Update]

Why Nashville Companies Are Targeting Tweens For High-Tech Jobs BY ALISSA WALKER | 07-09-2012

See here for news on ROWE in Nashville. Nicholas Holland demonstrates with his ROWE notes.

My older ROWE related posts here.

# # # #

Mar 13, 2012

What good do personal clouds and corporate data hives, aquihires and crowdsourcing do to meet your needs (as HR continues to stumble around trying to hire long-term individuals for short-term projects, meanwhile preparing for the year-end mass layoffs which inexorably ensue) if your managers cannot get past their love affair of physical MBWA when your employees are enculturated to do their best work in virtual innovation clusters and collaboratories (see article comments) which take place in a SecondLife CoLab or some such? What good does it do to build a city-wide innovation grid  infrastructure or a country-wide innovation cyber space if you still expect your employees to waste an hour of their day driving to and from a cube which holds a desktop computer when they have a speedier, more robust laptop at home? 1) Learn about Results Only Work Environments. 2) Invest in them. 3) Use them.

This Day in Really Fast Data

Making Broadband Construction Faster and Cheaper

NSF Leadership in Discovery and Innovation Sparks White House US Ignite Initiative

Whitehouse US Ignite Announcement and Discussion on Facebook Live

Map of Partners

Internet2 Statement Regarding Launch of US Ignite

Cha# is Gig Poster Child (video)

Build Eisenhower’s Highway System for Today’s Needs

Dear Colleague Letter: New Solutions to Create Integrative Data Management Infrastructure(s) for Research Across the Sciences

Demystifying Financial Services Semantic Conference – The Business Value of Data and Semantics

What Is the Weak Link re: Big Data in Financial Services?

On the one hand CIOs Say Information Management Programs Are Underfunded while on the other firms like Morgan Stanley Takes On Big Data With Hadoop.

The way it has typically been done for 20 year is that IT asked the business what they want, creates a data structure and writes structured query language, sources the data, conforms it to the table and writes a structured query. Then you give it to them and they often say that is not what they wanted.

– Gary Bhattacharjee, executive director of enterprise information management at Morgan Stanley

Do you see the weak link?

Draft Agenda BIG DATA Workshop June 13-14

Day 1:
8:30 – 8:45: Introduction
8:45-10:15: Keynote

  • Ian Foster, Argonne National Lab
  • Michael Stonebraker, SciDB
  • Neal Ziring, NSA, Technical Director of Information Assurance Directorate

10:15-10:30 Break
10:30- 12:00 Session 1: Data & Algorithms
Mike Franklin, Director, AMPlab, UC Berkeley
Kirk Borne, Professor of AstroPhysics and Computational Science, George Mason University
Arie Shoshani, DOE Data Initiative, LBL
12:00-1:00 Lunch
1:00 – 2:30 Session 2: Health Care Analytics
Bob Grossman, Director of Informatics at the Institute for Genomics and Systems Biology
Harry Greenspun, Senior Advisor, Deloitte Center for Health Solutions
Mark Adams, caBIG Project Manager, Booz-Allen
2:30-2:45: Break
2:45-4:30: Big Data Programs Panel
Howard Wactlar, NSF
Tsengdar Lee, NASA
Lucy Nowell, DOE
Peter Lyster, NIH
Peter Highman, IARPA
Frederica Darema, Air Force Office of Scientific Research
4:30-6:00 Posters
6:00-8:00 Reception – Gaithersburg Hilton
Day 2:
8:30 – 9:45 Session 4: Science Analytics
Artur Barczyk, CalTech
Peter Fox, Virtual Observatories
Milt Halem, Earth Science
10:00-11:15 Session 5: Business Analytics
Charles Kaminski, Chief Architect, LexisNexis
Gary Reichter, Global Business Solutions, T. Rowe Price
Lockheed Martin, Joe Schwartz
11:15 – 11:30 Break
11:30 – 12:45
Session 6: BIG DATA Platforms
Dennis Gannon, Microsoft
John McPhearson, IBM
Robert Stackowiak, Oracle
12:45-1:00 Wrap-Up