We are scheduling all our events through our “Research Triangle Analysts” Meetup group You can also view all events in calendar form and watch the videos of some of our talks. We have a meeting on the third Tuesday of every month at 6:30 p.m. (location and topic to be announced). We also have a monthly lunch on the first Friday of every month at noon (location to be announced).



Friday, April 1st, 2016noon (monthly lunch):
Location: Baba Ghannouj, 5400 S Miami Blvd, Durham, NC (map)

“All Graphs Are Wrong, but Some are Useful” with Xan Gregg

Tuesday, April 19th, 2016 – 6:30 p.m. (regular meeting): Doors open at 6:30pm, presentation starts at 7:00pm.
Location: Renaissance Computing Institute (RENCI)
Suite 590, Biltmore Conference Room (Room 534)
100 Europa Drive, Chapel Hill, NC

Data visualization is our most efficient tool for understanding information, but it’s far from perfect. Collected data is an imperfect representation of the underlying information. A graph is an imperfect representation of the data. Our understanding is an imperfect representation of the graph. But don’t despair. Xan Gregg, creator of the Graph Builder, will talk about how understanding visual perception can help us make more effective data visualizations.

Xan Gregg leads data visualization development at JMP, a business unit of SAS that specializes in data visualization software. He created the Graph Builder feature introduced in JMP 8 and continues to be its principal developer. Gregg is a frequent contributor to the JMP Blog and is known for his series of graph makeover posts. He is founder of the One Less Pie social media campaign, which seeks to replace inappropriate pie charts with better alternatives. Gregg participates in the online JMP User Community and often speaks at customer events like the JMP Discovery Summit. At the inaugural Discovery Summit Europe, Gregg won the award for Best Invited Paper.

Gregg is an active participant on visualization question-and-answer sites Cross Validated and HelpMeViz. In 2006, he won first place in Business Intelligence Network’s Data Visualization Competition. Gregg has participated in volunteer hackathons, including one that produced a highly acclaimed graphic for the 2015 Hunger Report. His primary fields of interest are exploratory data analysis and information visualization. He is also a regular at RTA Meetups!

ARCHIVED 2016 Events


RTA is proud to support Data4Decisions this year! Register today to learn how to unlock the power of your data! Get $100 off if you register before February 19th, and use promo code RTA25 to save an additional $25!

 Presentation: Spark vs. Hadoop for Big Data

Tuesday, March 15th, 2016 – 6:30 p.m. (regular meeting): Doors open at 6:30pm, presentation starts at 7:00pm.
Location: HCA, Inc. Auditorium, Fuqua School of Business, One Science Drive, Durham, NC (map)

As the landscape of big data processing engines rapidly evolves, many are left to wonder which tool they should use? Hadoop, Spark, something else? In this session, you’ll learn about which engines are a best fit for specific use cases, skill sets, code deployments, and resource constraints. We’ll provide a framework for deciding on the right tool for your job and help you determine when you shouldn’t choose one over the other, but rather use both in conjunction. You can also expect to learn how cloud technologies are simplifying infrastructure to make it easier to leverage multiple data engines and avoid lock-in.

Presenters: Technology experts (Kalen Zhang and Phil D’Agostino) from Qubole


Saturday, March 12th, 2016 – 8am-5pm:
Location: Blue Cross Blue Shield NC, 4727 University Dr, Durham, NC

Click here for more details on this unconference hosted by and for analytics professionals.

Banner image for Analytics_Forward 2016 (final) v2

“Opening the Black Box: A Users Guide to Optimization” with Melinda Thielbar

Tuesday, February 16th, 2016 – doors open at 6:30 pm, start at 7:00 pm:
Location: Reynolds Auditorium, Fuqua School of Business, One Science Drive, Durham, NC (map)

The auditorium is on the right side of the building when you enter from Tower View entrance. There are limited parking spaces on Fuqua drive, but visitor parking is also available on Science Drive.

Statistical models, recommendation engines, data visualization, and artificial intelligence all have one thing in common: They use algorithms to find the best answer (or to rank potential answers so the user can pick one). Numerical optimization is at the heart of this process. Understanding how the computer finds its answers can help you get the most out of your data–no matter what analysis software you use.

Intended as a user’s guide to numerical algorithms, this talk shows the most common types of computer optimization, how and why they’re used in different kinds of data analysis, and their underlying assumptions. Examples with visualizations are shown in JMP software, but the concepts apply to any statistical or data visualization package.

Whether you’re new to data analysis or writing your own optimization routines, this presentation demonstrates a useful way to think about computer optimization that will open up the black box between “data in” and “answers out”.

Melinda Thielbar, PhD (JMP Senior Research Statistician Developer, SAS) currently specializes in statistical methods for consumer research and categorical data, though she has experience as far ranging as naval power system analysis, fraud detection and Hollywood script consulting. Melinda is a Co-Founder of Research Triangle Analysts, a crazy cat lady, and an enthusiastic amateur artist.

Special Interest Group: JMP – Kickoff Meeting

Tuesday, February 9th, 2016 – 6:30pm-8:30pm (RTA subgroup):
Location: Relish Cafe, 5625 Creedmoor Rd Raleigh, NC 27612


Friday, February 5th, 2016noon (monthly lunch):
Location: Page Road Grill, Page Road, Durham, NC

Special Interest Group: Sport Analytics – Kickoff Meeting

Thursday, January 28th, 20167pm-9pm (RTA subgroup):
Location: Bull City Co-Working, 112 S. Duke St. – Suite 6, Durham, NC

Do you get excited using math with sports data? Did you learn analytics through sports? Want to share opportunities to use sports analytics to teach, for a hobby, for research, for a career?

This is our kickoff meeting for this new subgroup. There will be a brief overview of sports analytics and then the floor will be open for discussion on future topics/speakers. Feel free to bring use cases for discussion.

“Topological Data Analysis” with Hamza Ghadyali

Tuesday, January 19th, 2016 – 6:30 p.m. (regular meeting):

Location: Renaissance Computing Institute (RENCI)
Suite 590, Biltmore Conference Room (Room 534)
100 Europa Drive
Chapel Hill, NC

“Data has shape– and shape has meaning.” [1] Topology is the mathematical study of shape and in the past decade TDA tools have been applied to large, noisy, complex datasets to understand problems in many science and engineering disciplines including oncology, astronomy, and neuroscience.  In this talk, I’ll explain what topology is, briefly go into the mathematics of persistent homology and Morse-filtrations, and discuss some applications in signal processing, clustering, and pattern recognition.  To ensure that everyone gets something out of the talk, pictures will be emphasized over formulas.
[1] Quote from Gunnar Carlsson (one of the founders of TDA)

Hamza Ghadyali is a Ph.D. candidate in mathematics at Duke, developing new TDA tools, in particular for the analysis of EEG data from people with epilepsy.

ARCHIVED 2015 Events


Friday, December 4th, 2015noon (monthly lunch):
Location: Moe’s Southwestern Grill

Plan next year!

Wednesday, November 18th, 20156:30 p.m. (planning meeting):
Location: MEZ Contemporary Mexican, 5410 Page Road, Durham, NC

November is planning month at Research Triangle Analysts. Join us for a beverage and while we all talk about what brought you to our past events (the survey is now closed) and what will bring you to our future events.

 NC Data4Good: Data Crunch

Saturday, November 7th, 2015 & Sunday, November 8th, 2015 – all day:
Location: MaxPoint, 3020 Carrington Mill Blvd, Suite 300, Morrisville, NC 27560

Description: We have partnered with United Way of the Greater Triangle, Data Crunch Lab, and MaxPoint to address childhood hunger and food insecurity here in the Triangle (results from the 2015 Data Crunch for Social Good).


Friday, November 6th, 2015noon (monthly lunch):
Location: Primal Food and Spirits, 202 W N.C. Hwy. 54, Durham, NC

Board Meeting

Monday, October 26th, 20155:30 pm (open to public):
Location: Randy’s Pizza, 5311 S Miami Blvd, Durham, NC 27703

Melissa Nysewander presented: “Applied Data Science: A Case Study in Workforce Analytics”

Thursday, October 15th, 2015 – 6:30 p.m. (regular meeting): Doors at 6:30pm, presentation starts at 7:00.

Location: Fidelity Investments, 100 New Millennium Way, Durham, NC
Park in the parking deck, accessible on Select Dr., near the Davis Dr. intersection. Cross the bridge to the main building, have your ID ready and register as a visitor at security. A Fidelity employee will escort you.

Data science is more than a single algorithm or technology, it is a methodology tying together scientific reasoning, hypothesis testing, machine learning, and statistics. It is about knowing enough programming to grab and manipulate data at the finest grain, and enough statistics to extract real (not spurious) insights. But it is also about being able to ask the right questions, design meaningful tests, and in the end, communicate results to the people making decisions. This talk is a practical explanation of what it takes to successfully execute an enterprise-level data science project, from beginning to end, emphasizing both the soft and hard skills necessary to do so. To illustrate, our speaker will present a recent case study in workforce analytics in which she performed text analysis using Python & R on scraped web data.

After Party:
Serena (, 5311 S Miami Blvd, Durham, NC 27703.  Close by, good beer and tasty food.


Friday, October 2nd, 2015noon (monthly lunch):
Location: Baba Ghannouj, 5400 S Miami Blvd, Durham, NC 27703

Danny Siegle presented: “Machine Learning and the Life Sciences”

Thursday, September 17th, 2015 – 6:30 p.m. (regular meeting): Doors at 6:30pm, presentation starts at 7:00.

Location: The Redwoods Group, 2801 Slater Rd # 110, Morrisville, NC 27560

This presentation covers machine learning from different biological domains, together with working code examples, including:
1. QSAR prediction for drug discovery
2. A Next-Gen Sequencing application
3. Code example from the recent Kaggle diabetic retinopathy competition (diagnostic image analysis)

The goal of this talk is not to cover technical details of every method but to to help biologists to see the value of machine learning and statisticians to understand opportunities in the biological sciences.
The presentation notebook is on GitHub, so that anyone interested in taking a deeper dive can run the code and see the results.


Friday, September 4th, 2015noon (monthly lunch):
Location: Neomonde, 3817 Beryl Rd, Raleigh, NC 27607

Lucia Gjeltema presented: “SparkR – distributed computing in R using Spark clusters”

Thursday, August 20th, 20156:30 p.m. (regular meeting):

Location: Cameron Village Regional Library

Description (code for the demo):
Data processing and machine learning tasks in R are usually limited to data sets that fit in the memory of one single machine. The new R frontend to Apache Spark, called SparkR, harnesses Spark’s distributed computing powers to run large-scale data analysis directly in R. Originally an R package, SparkR is now officially merged into Apache Spark (since release 1.4 in June 2015).
This talk introduces SparkR and one of its core components – the SparkR DataFrame, a way of bringing distributed computing capabilities to the world of data frames.


Friday, August 7th, 2015noon (monthly lunch):
Location: Vit Goal Tofu Korean Restaurant, 2107 Allendown Dr #101A, Durham, NC 27713 (I-40 exit 278)

Chris Calloway presented: “Python Data Science with Pandas”

Thursday, July 16th, 20156:30 p.m. (regular meeting):

Location: Renaissance Computing Institute (RENCI)
Suite 590, Biltmore Conference Room (Room 534)
100 Europa Drive
Chapel Hill, NC

Description (video of the talk):
Pandas is a software package providing R-like “data frame” wrangling in Python. We interactively explore
• Data input and output
• Data transformation
• Data analysis
• Data visualization
with Pandas using some interesting data to answer contemporary social questions.


Friday, July 3rd, 2015noon (monthly lunch):
Location: Dim Sum House, 100 Jerusalem Drive #104, Morrisville, NC 27560

Ian Cook presented: “Working with Geospatial Data”

Thursday, June 18th, 20156:30 p.m. (regular meeting):
Location: Cameron Village Regional Library, 1930 Clark Avenue, Raleigh, NC

Ian Cook talks about working with Geospatial data for analysis, including:
• What spatial data is.
• Where you can find spatial data to work with.
• The challenges of working with spatial data.
• Key facilities R provides for loading, manipulating, and analyzing spatial data.
Demonstrations are in R and Spotfire, with an open discussion for others to talk about how they work with spatial data using their preferred tools.

Link to slides and source code:
Link to video:


Friday, June 5th, 2015noon (monthly lunch):
Location: Pulcinella’s Italian Restaurant, 4711 Hope Valley Road, Durham, NC 27707

Cyber Security Mini-Hackathon

Thursday, May 21st, 20156:30 p.m. (regular meeting):
Location: Cisco Systems Building 7, Maggie Valley Conference Room
7100-7 Kit Creek Rd, Morrisville, NC

This is a joint meetup with the Big Data and Cyber Security Meetup. Bring your laptops and be ready to work with security experts to understand network data and develop ways to analyze it!

We will be working with one of the data sets from this site: (suggestions for which data set from this list are welcome).
If you don’t have it already, you will probably want the free program WireShark on your laptop:
Also, have your analytics program of choice loaded up and ready to go!

Analysis and results can be posted on


Friday, May 1st, 2015noon (monthly lunch):
Location: Mellow Mushroom, 410 Blackwell Street, Durham, NC

Rajesh Seluklar presented: “Functional Modeling of Longitudinal Data”

Thursday, April 16th, 20156:30 p.m. (regular meeting):
Location: RENCI, 100 Europa Drive, Suite 540, Chapel Hill, NC (map)

Description [link to Rajesh’s paper]:
In many studies, a continuous response variable is repeatedly measured over time on one or more subjects. The subjects might be grouped into different categories, such as cases and controls. The study of resulting observation profiles as functions of time is called functional data analysis. This paper shows how you can use the SSM procedure in SAS/ETS® software to model these functional data by using structural state space models (SSMs). A structural SSM decomposes a subject profile into latent components such as the group mean curve, the subject-specific deviation curve, and the covariate effects. The SSM procedure enables you to fit a rich class of structural SSMs, which permit latent components that have a wide variety of patterns. For example, the latent components can be different types of smoothing splines, including polynomial smoothing splines of any order and all L-splines up to order 2. The SSM procedure efficiently computes the restricted maximum likelihood (REML) estimates of the model parameters and the best linear unbiased predictors (BLUPs) of the latent components (and their derivatives). The paper presents several real-life examples that show how you can fit, diagnose, and select structural SSMs; test hypotheses about the latent components in the model; and interpolate and extrapolate these latent components.


Friday, April 10th, 2015noon (monthly lunch):
Location: Salsa Fresh (their Morrisville location), 3588 Davis Drive, Morrisville, NC

 “Have an idea, need an idea” – An Unmeeting

Thursday, March 19th, 20156:30 p.m. (regular meeting):
Location: Village Draft House
428 Daniels St (Cameron Village), Raleigh, NC

This is ‘have an idea/need an idea’. Come with a question about analytics, something cool you’ve done, or a problem that has you stumped. Get some feedback from your fellow analysts about where you can look for more resources or go next!
We’ll be sitting at tables of 6 or so, so use this discussion board here to get the conversation started about what we should discuss!

Analytics Forward – An Unconference

Saturday, March 14th, 2015  a.k.a. 3/14/15 a.k.a. Pi day – 8am-5pm (special event):
Location: Blue Cross Blue Shield NC, 4727 University Place, Durham, NC 

Analytics Forward is a free unconference by and for analytics professionals. Thanks to our amazing sponsors Cross and Blue Shield of North Carolina, JMP, MaxPoint, and NCDS, we spent a Saturday at the Blue Cross and Blue Shield of North Carolina campus learning about the latest techniques, trends, and tools in analytics.


Friday, March 6th, 2015noon (monthly lunch):
Location: Guglhupf, 2706 Durham-Chapel Hill Blvd, Durham, NC

Tim Hopper presented: “Pyspark”

Thursday, February 19th, 20156:30 p.m. (regular meeting):
Location: Bronto Software Inc.
Suite 410, 324 Blackwell Street, Durham, NC

Description (slides and code):
Apache Spark is a next generation cluster computing framework and data processing engine. By combining Spark’s primitive operations in a functional style, the user can perform complex computations on large datasets. Though similar to Hadoop, Spark relies much more heavily on RAM (instead of HDFS) and has been demonstrated as running up to 100x faster than Hadoop for some applications. This talk will introduce Spark in general and then show PySpark, the Python wrapper around core Spark, as a tool for rapid, interactive analytics as well as robust, production data pipelines. Finally, we will look at MLlib, Spark’s distributed machine learning library.

Tim Hopper is a software engineer at, a web analytics startup. He has a masters in operations research from North Carolina State University.


Friday, February 6th, 2015noon (monthly lunch):
Location: Moes Southwestern Grill, 127 Westin Parkway, Cary, NC

 Grant Ingersoll presented: “Solr 5: scalable search and analytics in one place”

Thursday, January 15th, 20156:30 p.m. (regular meeting):
Location: Renaissance Computing Institute (RENCI)
Suite 590, Biltmore Conference Room (Room No. 534), 100 Europa Drive, Chapel Hill, NC

Search engine technology is rapidly evolving from keyword based looks up to a highly sophisticated ranking engine capable of incorporating many different features across complex data types. With the pending release of Apache Solr 5, it is now possible to ask more interesting questions of multi-structured content than ever before. In this talk, we’ll explore how Solr 5 provides a number of new and interesting features — ranging from incredibly easy data ingest to advanced faceting and statistical capabilities — for analysts and why Solr should be in every analysts toolbox.

Grant is the CTO and co-founder of LucidWorks, co-author of “Taming Text” from Manning Publications, co-founder of Apache Mahout and a long-standing committer on the Apache Lucene and Solr open source projects. Grant’s experience includes engineering a variety of search, question answering and natural language processing applications for a variety of domains and languages. He earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse University.

Link to video:


Friday, January 2nd, 2015noon (monthly lunch):
Location: Nantucket Grill, 5925 Farrington Rd, Chapel Hill, NC 27517

 ## ARCHIVED 2014 Events

Thursday, November 20th, 20146:30 p.m. (planning meeting):
Location: MEZ Contemporary Mexican Restaurant, 5410 Page Road, Durham, NC

Plan next year! November is planning month at Research Triangle Analysts. Come have a beer and talk about what you want to do next year. We will be sending out a survey and go through the findings during our meetup.

Friday, November 7th, 2014noon (monthly lunch):
Location: Cilantro (Mediterranean Grill), 5400 South Miami Blvd, Suite 102, Durham, NC

Thursday, October 16th, 20146:30 p.m. (regular meeting):
Location: Chow, 311 Creedmoor Rd, Raleigh, NC 27613

Steve Geringer presented: “How to Build Effective Machine Learning Applications”

Machines are getting smarter every day. How do they it? What will be left for the humans once the machines completely take over? Learn how you can contribute to the subjugation of mankind by building your own machine learning applications. While we won’t cover sci-fi or philosophical aspects, we will cover many important technical considerations for building effective machine learning (ML) applications.
Steve Geringer is a triangle area software consultant and ML enthusiast.

Friday, October 3rd, 2014noon (monthly lunch):
Location: Cafe Carolina and Bakery, 137 Weston Pkwy, Cary, NC 27513

Thursday, September 18th, 20146:30 p.m. (regular meeting):
Location: RENCI, 100 Europa Drive, Suite 540, Chapel Hill, NC (map)

Elizabeth Claassen presented: “Improved Inference in Generalized Linear Mixed Models”

In small samples it is well known that the standard methods for estimating variance components in a generalized linear mixed model (GLMM), pseudo-likelihood and maximum likelihood, yield estimates that are biased downward. An important consequence of this is that inferences on fixed effects will have inflated Type I error rates because their precision is overstated. We introduce a new method for estimating parameters in GLMMs that applies a Firth bias adjustment to the maximum likelihood-based GLMM estimating algorithm. We apply this technique to one- and two-treatment logistic regression models with a single random effect. We show simulation results that demonstrate that the Firth-adjusted variance component estimates are substantially less biased than maximum likelihood estimates and that inferences using the Firth estimates maintain their Type I error rates more closely than the standard methods.

Friday, September 5th, 2014noon (monthly lunch):
Location: McDaids Irish Pub & Restaurant, Hillsborough Street, Raleigh, NC

Thursday, August 21st, 20146:30 p.m. (regular meeting):
Location: Blue Cross Blue Shield, 5901 Chapel Hill Blvd, Chapel Hill, NC (map)

Laurel Trantham presented: “Utilization and Substitution of Urgent Care, Emergency Departments, and Primary Care Physicians”

Blue Cross Blue Shield is always looking to reduce healthcare costs. One driver of high costs is that many individuals receive medical care at emergency rooms when urgent care centers and primary care offices may be more appropriate sites of care.  Laurel Trantham reviewed some of the analysis in this area, including why this is important to explore, and discussed several modeling approaches being considered.

Friday, August 1st, 2014noon (monthly lunch):
Location: MEZ Contemporary Mexican Restaurant, 5410 Page Road, Durham, NC

Thursday, July 17th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Brian Fannin presented: “Statistics Without Borders”

Brian Fannin shared his experience as part of ‘Statistics Without Borders’ ( The team spent a week in Africa teaching R and statistical modeling to members of the Rwandan Biomedical Center.

Brian is not a proper statistician (he’s an actuary), but he loves R, loves to travel and loves to try and make the world a better place through data. He especially loves doing all three at once.

Friday, July 11th, 2014noon (monthly lunch):
Location: An, 2800 Renaissance Park Place, Cary, NC 27513 (I-40 exit 287)

Thursday, June 19th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Mason DeCamillis presented: “Introduction to Julia”

Julia is a relatively new programming language that aims to blend the good parts of Matlab, C, R, and Python (with fewer of the bad parts). Its growth in popularity make it an increasingly promising option for programmers doing technical, computationally-intensive work. This presentation explored the advantages of Julia in a data analysis context, with examples from both the base library and several user-written packages. Additional information is available at and

Mason DeCamillis is a statistical programmer and data analyst with a Master’s degree in Applied Statistics and a knack for crashing his computer by testing out experimental software. He is cautiously enthusiastic about Julia (see ), and is excited to share with Research Triangle Analysts.

Friday, June 6th, 2014noon (monthly lunch):
Location: City Beverage, 4810 Hope Valley Rd, Durham, NC 27707

Thursday, May 15th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Joseph Morgan presented: “Covering Arrays”

Software (and analytical model) testing may require considering hundreds or thousands of parameters. Usual “test all” or “full factorial” methods can require too many runs  to be practical. Covering arrays make it possible to consider “full coverage” of a software suite with a smaller number of runs (see  ).
Joseph Morgan, Senior Software Developer for JMP at SAS Institute, presented his research on this important field.

Friday, May 2nd, 2014noon (monthly lunch):
Location: Dales Indian Restaurant, RTP location: 5410 Nc Highway 55, Durham, NC 27713

Thursday, April 17th, 20146:30 p.m. (regular meeting):
Location: Mattie B’s Public House

Have an Idea, Need an Idea!

1) Come in with an idea you’d like to discuss – either a problem that you’re stuck on, or a great idea where you’d like some feedback.
2) Be ready to present to a small group of about 4-6 people while you enjoy the great food and craft beer at Mattie B’s. This is a sit down presentation. You can bring your laptop and show some code if you want, but this is mostly a chance to “think out loud” with some interested folks.
One of the biggest interest areas from the Feedback survey was “Want to connect with peers,” but the social events got the most votes for “least favorite” meetings. This is a chance to find people who are interested in some of your favorite topics!

Friday, April 4th, 2014noon (monthly lunch):
Location: Shiki – Sushi & Asian Fusion

Thursday, March 20th, 20146:30 p.m. (regular meeting):
Location: MetLife has generously provided space for this event.

Dan Kelly presented: “Random Forests and Boosted Trees”

One of the most-used predictive modeling techniques, the decision tree has a lot of great interpretation as well as predictive properties. But single decision trees can overfit your data and give misleading results. How do you decide when the tree has enough “branches”? Enter the random forest.

We had a discussion on our new mission statement at this meeting (attendants shared with us how they would like us to serve them and what they envision the Research Triangle Analysts to become in the future). After party at MEZ Contemporary Mexican Restaurant.

Friday, March 7th, 2014noon (monthly lunch):
Location: Bonefish Grill
We brainstormed on starting a nonprofit organization.

Thursday, February 20th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Lucia Gjeltema presented: “Community Detection”

Network graph analysis is a hot topic in social media, fraud detection, and academia. In many applications, networked individuals end up on one large “clump”, making further analysis nearly impossible. Community detection is one way to break a huge graph into small meaningful groups for real-world analyses.
Various structural definitions of graph communities were introduced and an overview of algorithms that capture them was given. The presentation was concluded with a review of performance metrics that compare detected communities with ground-truth information.

Friday, February 7th, 2014noon (monthly lunch):
Location: Backyard Bistro
We discussed starting a nonprofit organization.

Thursday, January 16th, 2014 – 6:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Tim Hopper presented: “Intro to Scikit-Learn”

Scikit-learn is an actively developed Python package providing an implementation of many machine learning algorithms (e.g. SVM, kNN, linear models, HMM, k-Means, spectral clustering). However, the benefits of Scikit-learn goes well beyond carefully implemented learning algorithms. Being built in Python, it allows easy integration with countless other Python modules for tasks such as plotting, data munging, and application development. Its consistent API across algorithms allows for rapid experimentation with multiple learning methods. Also, Scikit-learn is well documented and provides lots of examples.

Instead of discussing particular machine learning algorithms provided by the package, I will focus on Scikit-learn and Python as a toolkit for solving data problems from start to finish. I will emphasize the Pipeline tool which allows the user to chain together all the steps of a machine learning pipeline including preprocessing, dimensionality reduction, feature selection, and model fitting.

## ARCHIVED 2013 Events

Thursday, November 21st, 2013 – 6:30 p.m. (planning meeting):
Location: Larry’s Southern Kitchen

Plan next year! This has been a great year for RTA. We now have 100 members on Meetup, and we’ve had some amazing speakers and guests. Help us plan to make the group even bigger and better next year.

Friday, November 1st, 2013 – noon (monthly lunch):
Location: BurgerFi – Cary

November is RTA planning month! Join us for a lunchtime roundtable on where the analytics field is heading and what we should do next year.

Thursday, October 17th, 2013 – 6:30 p.m. (regular meeting):
Location: Buffalo Brothers on Lake Boone Trail
4025 Lake Boone Trail Suite #100, Raleigh, NC 27607

Dahl Winters presented: “Scaling the Big Data Mountain”.

In this whirlwind hour I will attempt to blaze a trail through the wilderness that is big data science.  Given a mountain of unstructured data and the jungle of options in the Hadoop ecosystem, it can be difficult to know which tools to use for which investigations.  We will take a guided tour of the most common Hadoop use cases, peer into NoSQL and graph databases, march over to machine learning, avoid sinking into deep learning, and cover some of the classification and clustering algorithms I’ve worked with in my big data explorations.  If you can survive this hour unscathed, you will be that much more prepared to tackle your own big data mountain.

Thursday, September 19th, 2013 – 6:30 p.m. (regular meeting):
Location: Cisco Systems 7200 Kit Creek Rd; Morrisville, NC
Building 11, First Floor, Conference Room E-UNION

“Big Data Analytics and CyberSecurity”

No food at this event. After party instead at Trali Irish Pub’s NEW LOCATION.

Big data is expected to play a crucial role in the cybersecurity landscape. Learn how the security industry is using big data analytics and integrating Artificial Intelligence techniques (statistical analysis, autonomic/agent-based computing, ensemble classification, game-theoretic self-optimization) within the framework of distributed, intelligent, and forward-thinking security architecture. For example, Cisco is using these techniques to create solutions in the domain of Network Behavior Analysis (NBA), in order fight against modern sophisticated attacks in today’s cyberspace, including Advanced Persistent Threats (APT), exploit kits, zero-day attacks, molymorphic malware and trojans inside the client’s network.

Thursday, August 15th, 2013 – 6:30 p.m. (regular meeting):
Location: Saladelia

“Tool Throwdown: Kaggle competition – Titanic dataset”

RTA founders demonstrated their predictive modeling skills using their favorite statistics and programming tools. On display will be SAS, R, JMP, and maybe more!
Description: Analyzing the Titanic data set from the Kaggle competition.

Thursday, July 18th, 2013 – 6:30 pm (regular meeting):
Location: This month’s event space has been graciously provided by the Institute for Advanced Analytics at North Carolina State University.

Oscar Boykin presented: “Sketching and Streaming: building large-scale, real-time relevance features at Twitter”. 

We will discuss approximation algorithms for fast, cheap and accurate aggregation, which are used in production at Twitter. We will also briefly cover the open source software we released to do this: scalding, algebird and storm.

Dan Kelly presented: “Assessment and Comparison of Predictive Models with Binary Targets”, a practical guide for people who are doing predictive models.

Oscar Boykin is a native of Raleigh. He is currently on the analytics infrastructure team at Twitter, and co-creator the Twitter open source projects: scalding, algebird, bijection, chill, and summingbird.

Thursday, June 20th, 2013 – 6:30 pm (regular meeting):
Location: Saladelia (in their back room)
4201 University Drive, Durham, NC 27707

Ian Cook presented: “Workshop on submitting R jobs to the cloud”

Bring your laptops, enjoy the wifi and great food, and talk about data! Saladalia is located at .

Thursday, May 16th, 2013 – 6:30 pm (regular meeting):
Location: The Cuban Revolution, 318 Blackwell St, Durham, NC
Social / Networking meeting.

Thursday, April 25th, 2013 – 5:30 pm and 6:30 pm (regular meeting):
Location: The Cuban Revolution, 318 Blackwell St, Durham, NC and then the Durham Bulls Ball Park

Adam Sobsey presented: “Sabermetrics”

Adam writes for Baseball Prospectus, one of the premier publications for baseball statistics. “Our way of understanding baseball has undergone a revolution during the last generation. The field of baseball study known as “sabermetrics” (based on the acronym of the Society for American Baseball Research) has made huge advances in our approach to the complexity of the game, much of it via more thorough and sophisticated statistical analysis (aided by technological innovations, as well). Among the results of all this study is the essential sabermetric concept of the “Replacement Player.” The Replacement Player is an important but somewhat nebulous platonic ideal. The prevailing agreement is that he is basically good enough to play at the Triple-A minor-league level — the highest level below the major leagues — but does not have the skills to succeed for long stretches in the majors themselves. As it happens, the Durham Bulls are a Triple-A baseball team, all of its players striving to surpass and escape “replacement level” baseball. My talk will discuss some of the ways in which sabermetrics has changed our understanding of the game of baseball for the good, and some of the ways in which that understanding is still a work in progress–all against the very real backdrop of the men playing the game itself.”

Tuesday, April 2nd, 2013 – 6:30 pm (special event):
Location: Cuban Revolution, 318 Blackwell St, Durham NC

John D. Cook: Information is Cheap, Meaning is Expensive:How to Hire and Work With an Analyst (without breaking the bank)

More and more companies are investing in information, through better databases and more robust data tools. Many are finding, however, that extracting meaning from all that information is more difficult than they thought. There are many analysts who can assist–either as freelancers or employees–but how do you know you’re hiring the right talent? Should you hire a fill-time analyst or a contractor? How much should you pay? What skills should they have?

John D. Cook has over 20 years of experience applying mathematics to real-world problems. He has worked with firms large and small, using his skills and expertise to turn the data they have into the information the need. During this question and answer session, Johnwill discuss how to connect with the right talent, how to budget for an analysis project, and what to expect from an expert analyst.

Friday, April 5th, 2013 – noon (monthly lunch):
Location: Serena’s in RTP, 5311 South Miami Blvd, Durham NC

Michael Blanks presented: “Open Data & Government”.

Thursday, March 21st, 2013 – 6:30 pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern

John Sall presented: “From Big Data to Big Statistics”.

When you scale up the analysis, you have a lot of issues to address. When you have a lot of data, even a small difference is significant. When you screen a lot of hypotheses, adjusting for selection or multiple test bias is an issue. When you have a lot of bad data, making the analysis automatically robust becomes important. When you have big data, you need to make the computer work fast to get the job done. When you have thousands of results, you need to create compact summaries to show you all the results in one page, or at least produce the results sorted by significance. All these issues need to be resolved and the solutions encapsulated into a workflow for engineers and scientists that deal with more data each year.

John Sall is a co-founder and executive vice president of SAS Institute. He leads the JMP Division of SAS.

Friday, March 1st, 2013 – noon (monthly lunch):
Location: Serena’s

Bruce Connor led a discussion on “analytics for polling data”.

This discussion was focused around and the methods behind Nate Silver’s election predictions. Participants were invited to discuss the methods and their experience with other applications of the same methods.

Thursday, February 21st, 2013 – 6:30 pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern, 115 Page Point Circle, Durham, NC 27703

Melinda Thielbar presented: “Data Science is not a Fad. Let’s Keep it That Way”.

This presentation discusses the technical details of data science, in context with time series analysis and statistical modeling. A really good presentation for anyone interested in a hype-free primer on data science.

Friday, February 1st, 2013 – noon (monthly lunch):
Location: Neomonde, 10235 Chapel Hill Road, Morrisville, NC 27560

Linda Schumacher presented: “Running a Kaggle Competition team

RTA will be organizing a Kaggle team this year! Anyone who is interested in joining the team or just learning more about Kaggle will benefit from this meeting.

Thursday, January 24th, 2013 – 6:30pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern, 115 Page Point Circle, Durham, NC 27703

Eric Yount presented: “Analytic Methods for Clinical Data

This will be very informative for those who are primarily working in data mining and business analytics. The techniques Eric will discuss and the reasoning behind them present a different way of looking at data. Clinical trials experts will have an opportunity to discuss the process of collecting and analyzing clinical data.

## ARCHIVED 2012 Events

Thursday November 15th, 2012 – 6:30 PM
Social/Networking Meeting
“Looking Ahead to 2013”
Location: Chow

Thursday October 18th, 2012 – 6:30 PM
“Educating Analysts: How Can Schools Prepare Students for a Quantitative Career?”
by Bill Burpitt, Associate Dean at the School of Business, Elon University
Location: Chow

Thursday September 20th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Cuban Revolution

Thursday August 16th, 2012 – 6:30 PM
“Why the Future Will Convert Better” by Martin (Marty) Smith
Location: Chow

Thursday July 19th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Cuban Revolution

Thursday June 21st, 2012 – 6:30 PM
“Applications of R and R Mini Hack-A-Thon”, led by Ian Cook, TIBCO Spotfire
Location: Chow

Thursday May 17th, 2012 – 6:30 PM
Presentation by MaxPoint Interactive
Location: MaxPoint Interactive

Thursday April 19th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Wild Wing Cafe in Brier Creek

Thursday March 15th,2012 – 6:30 PM
“Have an Idea, Need an Idea”
Location: Earth Fare

Thursday February 16th, 2012 – 6:30 PM
Social/Networking Meeting
Location:Trali Irish Pub

Thursday January 19th, 2012 – 6:30 PM
First Triangle Analysts Social/Networking Meeting !
Location: Trali Irish Pub