Using R to get Data into Azure ML Studio — no ad-hoc connections

R allows you to fetch data from different, external sources, including Web pages. R practitioners are well-aware of different ways of doing it: we can read plain text, tabular datasets or structured (XML or HTML) data. Basically, readLines (the workhorse function to read raw text in R) takes a connection as input. And one of functions to create connections is url:

url(description, open = “”, blocking = TRUE, encoding = getOption(“encoding”))

This function can be use anywhere instead of a file name, for example to read the first 500 lines from a given file:

moby_url = url(“http://www.gutenberg.org/ebooks/2701.txt.utf-8″)

moby = readLines(moby_url, n = 500)

And this works perfectly inside R Studio.

With Azure ML Studio the story is completely different. Looks like components, including R scripts, can use only local (saved) datasets. So, the same code yields following error:

[ModuleOutput] Warning message:

[ModuleOutput]

[ModuleOutput] In readLines(moby_url, n = 500) : unable to resolve ”www.gutenberg.org”

[ModuleOutput]

[ModuleOutput] DllModuleHost Stop: 1 : DllModuleMethod::Execute. Duration: 00:00:04.3179728

[ModuleOutput] DllModuleHost Error: 1 : Program::Main encountered fatal exception: Microsoft.Analytics.Exceptions.ErrorMapping+ModuleException: Error 0063: The following error occurred during evaluation of R script:

[ModuleOutput] ———- Start of error message from R ———-

[ModuleOutput] cannot open the connection

[ModuleOutput]

[ModuleOutput]

[ModuleOutput] cannot open the connection

[ModuleOutput] ———– End of error message from R ———–

What are practical implications? In R connections can be create ad open implicitly, like this:

url2file <- “http://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv”

data <- read.csv(url2file)

head(data)

This commonly used feature is not available for R Script component inside Azure ML Studio:

image

Instead, you have to upload a data file into Azure, place it on experiment area as a saved dataset and connect the dataset output with a R Script input.

Interested? See you at SQL Day Workshop.

Share this article ...

Google Plus
Ihren LinkedIn Kontakten zeigen



This entry was posted in Machine Learning, R Scripts by Marcin Szeliga. Bookmark the permalink.
Marcin Szeliga

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, SQL Day, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server.

2 thoughts on “Using R to get Data into Azure ML Studio — no ad-hoc connections

Leave a Reply

Connect with:
  • This field its required.
  • This field its required.
    • Message is required