We are scheduling all our events through our “Research Triangle Analysts” meetup group You can also view all events in calendar form. We have a meeting on the third Thursday of every month at 6:30 p.m. (location and topic to be announced). We also have a monthly lunch on the first Friday of every month at noon (location to be announced).


Tim Hopper presents: “Pyspark”

Thursday, February 19th, 20156:30 p.m. (regular meeting):
Location: Bronto Software Inc.
Suite 410, 324 Blackwell Street, Durham, NC

Apache Spark is a next generation cluster computing framework and data processing engine. By combining Spark’s primitive operations in a functional style, the user can perform complex computations on large datasets. Though similar to Hadoop, Spark relies much more heavily on RAM (instead of HDFS) and has been demonstrated as running up to 100x faster than Hadoop for some applications. This talk will introduce Spark in general and then show PySpark, the Python wrapper around core Spark, as a tool for rapid, interactive analytics as well as robust, production data pipelines. Finally, we will look at MLlib, Spark’s distributed machine learning library.

Tim Hopper is a software engineer at, a web analytics startup. He has a masters in operations research from North Carolina State University.


Friday, March 6th, 2015noon (monthly lunch):
Location: TBD

Analytics Forward – An Unconference

Saturday, March 14th, 2015  a.k.a. 3/14/15 a.k.a. Pi day – 8am-5pm (special event):
Location: Blue Cross Blue Shield NC, 4727 University Place, Durham, NC 

Analytics Forward is a free unconference by and for analytics professionals. Spend a Saturday at the Blue Cross and Blue Shield of North Carolina (thanks to our amazing sponsor) campus learning about the latest techniques, trends, and tools in analytics. Pitch a talk to share your knowledge, or just attend and connect with your peers in the Research Triangle.

## ARCHIVED 2015 Events


Friday, February 6th, 2015noon (monthly lunch):
Location: Moes Southwestern Grill, 127 Westin Parkway, Cary, NC

 Grant Ingersoll presented: “Solr 5: scalable search and analytics in one place”

Thursday, January 15th, 20156:30 p.m. (regular meeting):
Location: Renaissance Computing Institute (RENCI)
Suite 590, Biltmore Conference Room (Room No. 534), 100 Europa Drive, Chapel Hill, NC

Search engine technology is rapidly evolving from keyword based looks up to a highly sophisticated ranking engine capable of incorporating many different features across complex data types. With the pending release of Apache Solr 5, it is now possible to ask more interesting questions of multi-structured content than ever before. In this talk, we’ll explore how Solr 5 provides a number of new and interesting features — ranging from incredibly easy data ingest to advanced faceting and statistical capabilities — for analysts and why Solr should be in every analysts toolbox.

Grant is the CTO and co-founder of LucidWorks, co-author of “Taming Text” from Manning Publications, co-founder of Apache Mahout and a long-standing committer on the Apache Lucene and Solr open source projects. Grant’s experience includes engineering a variety of search, question answering and natural language processing applications for a variety of domains and languages. He earned his B.S. from Amherst College in Math and Computer Science and his M.S. in Computer Science from Syracuse University.


Friday, January 2nd, 2015noon (monthly lunch):
Location: Nantucket Grill, 5925 Farrington Rd, Chapel Hill, NC 27517

 ## ARCHIVED 2014 Events

Thursday, November 20th, 20146:30 p.m. (planning meeting):
Location: MEZ Contemporary Mexican Restaurant, 5410 Page Road, Durham, NC

Plan next year! November is planning month at Research Triangle Analysts. Come have a beer and talk about what you want to do next year. We will be sending out a survey and go through the findings during our meetup.

Friday, November 7th, 2014noon (monthly lunch):
Location: Cilantro (Mediterranean Grill), 5400 South Miami Blvd, Suite 102, Durham, NC

Thursday, October 16th, 20146:30 p.m. (regular meeting):
Location: Chow, 311 Creedmoor Rd, Raleigh, NC 27613

Steve Geringer presented: “How to Build Effective Machine Learning Applications”

Machines are getting smarter every day. How do they it? What will be left for the humans once the machines completely take over? Learn how you can contribute to the subjugation of mankind by building your own machine learning applications. While we won’t cover sci-fi or philosophical aspects, we will cover many important technical considerations for building effective machine learning (ML) applications.
Steve Geringer is a triangle area software consultant and ML enthusiast.

Friday, October 3rd, 2014noon (monthly lunch):
Location: Cafe Carolina and Bakery, 137 Weston Pkwy, Cary, NC 27513

Thursday, September 18th, 20146:30 p.m. (regular meeting):
Location: RENCI, 100 Europa Drive, Suite 540, Chapel Hill, NC (map)

Elizabeth Claassen presented: “Improved Inference in Generalized Linear Mixed Models”

In small samples it is well known that the standard methods for estimating variance components in a generalized linear mixed model (GLMM), pseudo-likelihood and maximum likelihood, yield estimates that are biased downward. An important consequence of this is that inferences on fixed effects will have inflated Type I error rates because their precision is overstated. We introduce a new method for estimating parameters in GLMMs that applies a Firth bias adjustment to the maximum likelihood-based GLMM estimating algorithm. We apply this technique to one- and two-treatment logistic regression models with a single random effect. We show simulation results that demonstrate that the Firth-adjusted variance component estimates are substantially less biased than maximum likelihood estimates and that inferences using the Firth estimates maintain their Type I error rates more closely than the standard methods.

Friday, September 5th, 2014noon (monthly lunch):
Location: McDaids Irish Pub & Restaurant, Hillsborough Street, Raleigh, NC

Thursday, August 21st, 20146:30 p.m. (regular meeting):
Location: Blue Cross Blue Shield, 5901 Chapel Hill Blvd, Chapel Hill, NC (map)

Laurel Trantham presented: “Utilization and Substitution of Urgent Care, Emergency Departments, and Primary Care Physicians”

Blue Cross Blue Shield is always looking to reduce healthcare costs. One driver of high costs is that many individuals receive medical care at emergency rooms when urgent care centers and primary care offices may be more appropriate sites of care.  Laurel Trantham reviewed some of the analysis in this area, including why this is important to explore, and discussed several modeling approaches being considered.

Friday, August 1st, 2014noon (monthly lunch):
Location: MEZ Contemporary Mexican Restaurant, 5410 Page Road, Durham, NC

Thursday, July 17th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Brian Fannin presented: “Statistics Without Borders”

Brian Fannin shared his experience as part of ‘Statistics Without Borders’ ( The team spent a week in Africa teaching R and statistical modeling to members of the Rwandan Biomedical Center.

Brian is not a proper statistician (he’s an actuary), but he loves R, loves to travel and loves to try and make the world a better place through data. He especially loves doing all three at once.

Friday, July 11th, 2014noon (monthly lunch):
Location: An, 2800 Renaissance Park Place, Cary, NC 27513 (I-40 exit 287)

Thursday, June 19th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Mason DeCamillis presented: “Introduction to Julia”

Julia is a relatively new programming language that aims to blend the good parts of Matlab, C, R, and Python (with fewer of the bad parts). Its growth in popularity make it an increasingly promising option for programmers doing technical, computationally-intensive work. This presentation explored the advantages of Julia in a data analysis context, with examples from both the base library and several user-written packages. Additional information is available at and

Mason DeCamillis is a statistical programmer and data analyst with a Master’s degree in Applied Statistics and a knack for crashing his computer by testing out experimental software. He is cautiously enthusiastic about Julia (see ), and is excited to share with Research Triangle Analysts.

Friday, June 6th, 2014noon (monthly lunch):
Location: City Beverage, 4810 Hope Valley Rd, Durham, NC 27707

Thursday, May 15th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Joseph Morgan presented: “Covering Arrays”

Software (and analytical model) testing may require considering hundreds or thousands of parameters. Usual “test all” or “full factorial” methods can require too many runs  to be practical. Covering arrays make it possible to consider “full coverage” of a software suite with a smaller number of runs (see  ).
Joseph Morgan, Senior Software Developer for JMP at SAS Institute, presented his research on this important field.

Friday, May 2nd, 2014noon (monthly lunch):
Location: Dales Indian Restaurant, RTP location: 5410 Nc Highway 55, Durham, NC 27713

Thursday, April 17th, 20146:30 p.m. (regular meeting):
Location: Mattie B’s Public House

Have an Idea, Need an Idea!

1) Come in with an idea you’d like to discuss – either a problem that you’re stuck on, or a great idea where you’d like some feedback.
2) Be ready to present to a small group of about 4-6 people while you enjoy the great food and craft beer at Mattie B’s. This is a sit down presentation. You can bring your laptop and show some code if you want, but this is mostly a chance to “think out loud” with some interested folks.
One of the biggest interest areas from the Feedback survey was “Want to connect with peers,” but the social events got the most votes for “least favorite” meetings. This is a chance to find people who are interested in some of your favorite topics!

Friday, April 4th, 2014noon (monthly lunch):
Location: Shiki – Sushi & Asian Fusion

Thursday, March 20th, 20146:30 p.m. (regular meeting):
Location: MetLife has generously provided space for this event.

Dan Kelly presented: “Random Forests and Boosted Trees”

One of the most-used predictive modeling techniques, the decision tree has a lot of great interpretation as well as predictive properties. But single decision trees can overfit your data and give misleading results. How do you decide when the tree has enough “branches”? Enter the random forest.

We had a discussion on our new mission statement at this meeting (attendants shared with us how they would like us to serve them and what they envision the Research Triangle Analysts to become in the future). After party at MEZ Contemporary Mexican Restaurant.

Friday, March 7th, 2014noon (monthly lunch):
Location: Bonefish Grill
We brainstormed on starting a nonprofit organization.

Thursday, February 20th, 20146:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Lucia Gjeltema presented: “Community Detection”

Network graph analysis is a hot topic in social media, fraud detection, and academia. In many applications, networked individuals end up on one large “clump”, making further analysis nearly impossible. Community detection is one way to break a huge graph into small meaningful groups for real-world analyses.
Various structural definitions of graph communities were introduced and an overview of algorithms that capture them was given. The presentation was concluded with a review of performance metrics that compare detected communities with ground-truth information.

Friday, February 7th, 2014noon (monthly lunch):
Location: Backyard Bistro
We discussed starting a nonprofit organization.

Thursday, January 16th, 2014 – 6:30 p.m. (regular meeting):
Location: Cameron Village Regional Library

Tim Hopper presented: “Intro to Scikit-Learn”

Scikit-learn is an actively developed Python package providing an implementation of many machine learning algorithms (e.g. SVM, kNN, linear models, HMM, k-Means, spectral clustering). However, the benefits of Scikit-learn goes well beyond carefully implemented learning algorithms. Being built in Python, it allows easy integration with countless other Python modules for tasks such as plotting, data munging, and application development. Its consistent API across algorithms allows for rapid experimentation with multiple learning methods. Also, Scikit-learn is well documented and provides lots of examples.

Instead of discussing particular machine learning algorithms provided by the package, I will focus on Scikit-learn and Python as a toolkit for solving data problems from start to finish. I will emphasize the Pipeline tool which allows the user to chain together all the steps of a machine learning pipeline including preprocessing, dimensionality reduction, feature selection, and model fitting.

## ARCHIVED 2013 Events

Thursday, November 21st, 2013 – 6:30 p.m. (planning meeting):
Location: Larry’s Southern Kitchen

Plan next year! This has been a great year for RTA. We now have 100 members on Meetup, and we’ve had some amazing speakers and guests. Help us plan to make the group even bigger and better next year.

Friday, November 1st, 2013 – noon (monthly lunch):
Location: BurgerFi – Cary

November is RTA planning month! Join us for a lunchtime roundtable on where the analytics field is heading and what we should do next year.

Thursday, October 17th, 2013 – 6:30 p.m. (regular meeting):
Location: Buffalo Brothers on Lake Boone Trail
4025 Lake Boone Trail Suite #100, Raleigh, NC 27607

Dahl Winters presented: “Scaling the Big Data Mountain”.

In this whirlwind hour I will attempt to blaze a trail through the wilderness that is big data science.  Given a mountain of unstructured data and the jungle of options in the Hadoop ecosystem, it can be difficult to know which tools to use for which investigations.  We will take a guided tour of the most common Hadoop use cases, peer into NoSQL and graph databases, march over to machine learning, avoid sinking into deep learning, and cover some of the classification and clustering algorithms I’ve worked with in my big data explorations.  If you can survive this hour unscathed, you will be that much more prepared to tackle your own big data mountain.

Thursday, September 19th, 2013 – 6:30 p.m. (regular meeting):
Location: Cisco Systems 7200 Kit Creek Rd; Morrisville, NC
Building 11, First Floor, Conference Room E-UNION

“Big Data Analytics and CyberSecurity”

No food at this event. After party instead at Trali Irish Pub’s NEW LOCATION.

Big data is expected to play a crucial role in the cybersecurity landscape. Learn how the security industry is using big data analytics and integrating Artificial Intelligence techniques (statistical analysis, autonomic/agent-based computing, ensemble classification, game-theoretic self-optimization) within the framework of distributed, intelligent, and forward-thinking security architecture. For example, Cisco is using these techniques to create solutions in the domain of Network Behavior Analysis (NBA), in order fight against modern sophisticated attacks in today’s cyberspace, including Advanced Persistent Threats (APT), exploit kits, zero-day attacks, molymorphic malware and trojans inside the client’s network.

Thursday, August 15th, 2013 – 6:30 p.m. (regular meeting):
Location: Saladelia

“Tool Throwdown: Kaggle competition – Titanic dataset”

RTA founders demonstrated their predictive modeling skills using their favorite statistics and programming tools. On display will be SAS, R, JMP, and maybe more!
Description: Analyzing the Titanic data set from the Kaggle competition.

Thursday, July 18th, 2013 – 6:30 pm (regular meeting):
Location: This month’s event space has been graciously provided by the Institute for Advanced Analytics at North Carolina State University.

Oscar Boykin presented: “Sketching and Streaming: building large-scale, real-time relevance features at Twitter”. 

We will discuss approximation algorithms for fast, cheap and accurate aggregation, which are used in production at Twitter. We will also briefly cover the open source software we released to do this: scalding, algebird and storm.

Dan Kelly presented: “Assessment and Comparison of Predictive Models with Binary Targets”, a practical guide for people who are doing predictive models.

Oscar Boykin is a native of Raleigh. He is currently on the analytics infrastructure team at Twitter, and co-creator the Twitter open source projects: scalding, algebird, bijection, chill, and summingbird.

Thursday, June 20th, 2013 – 6:30 pm (regular meeting):
Location: Saladelia (in their back room)
4201 University Drive, Durham, NC 27707

Ian Cook presented: “Workshop on submitting R jobs to the cloud”

Bring your laptops, enjoy the wifi and great food, and talk about data! Saladalia is located at .

Thursday, May 16th, 2013 – 6:30 pm (regular meeting):
Location: The Cuban Revolution, 318 Blackwell St, Durham, NC
Social / Networking meeting.

Thursday, April 25th, 2013 – 5:30 pm and 6:30 pm (regular meeting):
Location: The Cuban Revolution, 318 Blackwell St, Durham, NC and then the Durham Bulls Ball Park

Adam Sobsey presented: “Sabermetrics”

Adam writes for Baseball Prospectus, one of the premier publications for baseball statistics. “Our way of understanding baseball has undergone a revolution during the last generation. The field of baseball study known as “sabermetrics” (based on the acronym of the Society for American Baseball Research) has made huge advances in our approach to the complexity of the game, much of it via more thorough and sophisticated statistical analysis (aided by technological innovations, as well). Among the results of all this study is the essential sabermetric concept of the “Replacement Player.” The Replacement Player is an important but somewhat nebulous platonic ideal. The prevailing agreement is that he is basically good enough to play at the Triple-A minor-league level — the highest level below the major leagues — but does not have the skills to succeed for long stretches in the majors themselves. As it happens, the Durham Bulls are a Triple-A baseball team, all of its players striving to surpass and escape “replacement level” baseball. My talk will discuss some of the ways in which sabermetrics has changed our understanding of the game of baseball for the good, and some of the ways in which that understanding is still a work in progress–all against the very real backdrop of the men playing the game itself.”

Tuesday, April 2nd, 2013 – 6:30 pm (special event):
Location: Cuban Revolution, 318 Blackwell St, Durham NC

John D. Cook: Information is Cheap, Meaning is Expensive:How to Hire and Work With an Analyst (without breaking the bank)

More and more companies are investing in information, through better databases and more robust data tools. Many are finding, however, that extracting meaning from all that information is more difficult than they thought. There are many analysts who can assist–either as freelancers or employees–but how do you know you’re hiring the right talent? Should you hire a fill-time analyst or a contractor? How much should you pay? What skills should they have?

John D. Cook has over 20 years of experience applying mathematics to real-world problems. He has worked with firms large and small, using his skills and expertise to turn the data they have into the information the need. During this question and answer session, Johnwill discuss how to connect with the right talent, how to budget for an analysis project, and what to expect from an expert analyst.

Friday, April 5th, 2013 – noon (monthly lunch):
Location: Serena’s in RTP, 5311 South Miami Blvd, Durham NC

Michael Blanks presented: “Open Data & Government”.

Thursday, March 21st, 2013 – 6:30 pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern

John Sall presented: “From Big Data to Big Statistics”.

When you scale up the analysis, you have a lot of issues to address. When you have a lot of data, even a small difference is significant. When you screen a lot of hypotheses, adjusting for selection or multiple test bias is an issue. When you have a lot of bad data, making the analysis automatically robust becomes important. When you have big data, you need to make the computer work fast to get the job done. When you have thousands of results, you need to create compact summaries to show you all the results in one page, or at least produce the results sorted by significance. All these issues need to be resolved and the solutions encapsulated into a workflow for engineers and scientists that deal with more data each year.

John Sall is a co-founder and executive vice president of SAS Institute. He leads the JMP Division of SAS.

Friday, March 1st, 2013 – noon (monthly lunch):
Location: Serena’s

Bruce Connor led a discussion on “analytics for polling data”.

This discussion was focused around and the methods behind Nate Silver’s election predictions. Participants were invited to discuss the methods and their experience with other applications of the same methods.

Thursday, February 21st, 2013 – 6:30 pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern, 115 Page Point Circle, Durham, NC 27703

Melinda Thielbar presented: “Data Science is not a Fad. Let’s Keep it That Way”.

This presentation discusses the technical details of data science, in context with time series analysis and statistical modeling. A really good presentation for anyone interested in a hype-free primer on data science.

Friday, February 1st, 2013 – noon (monthly lunch):
Location: Neomonde, 10235 Chapel Hill Road, Morrisville, NC 27560

Linda Schumacher presented: “Running a Kaggle Competition team

RTA will be organizing a Kaggle team this year! Anyone who is interested in joining the team or just learning more about Kaggle will benefit from this meeting.

Thursday, January 24th, 2013 – 6:30pm (regular meeting):
Location: Louie & Charlie’s Grille & Tavern, 115 Page Point Circle, Durham, NC 27703

Eric Yount presented: “Analytic Methods for Clinical Data

This will be very informative for those who are primarily working in data mining and business analytics. The techniques Eric will discuss and the reasoning behind them present a different way of looking at data. Clinical trials experts will have an opportunity to discuss the process of collecting and analyzing clinical data.

## ARCHIVED 2012 Events

Thursday November 15th, 2012 – 6:30 PM
Social/Networking Meeting
“Looking Ahead to 2013″
Location: Chow

Thursday October 18th, 2012 – 6:30 PM
“Educating Analysts: How Can Schools Prepare Students for a Quantitative Career?”
by Bill Burpitt, Associate Dean at the School of Business, Elon University
Location: Chow

Thursday September 20th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Cuban Revolution

Thursday August 16th, 2012 – 6:30 PM
“Why the Future Will Convert Better” by Martin (Marty) Smith
Location: Chow

Thursday July 19th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Cuban Revolution

Thursday June 21st, 2012 – 6:30 PM
“Applications of R and R Mini Hack-A-Thon”, led by Ian Cook, TIBCO Spotfire
Location: Chow

Thursday May 17th, 2012 – 6:30 PM
Presentation by MaxPoint Interactive
Location: MaxPoint Interactive

Thursday April 19th, 2012 – 6:30 PM
Social/Networking Meeting
Location: Wild Wing Cafe in Brier Creek

Thursday March 15th,2012 – 6:30 PM
“Have an Idea, Need an Idea”
Location: Earth Fare

Thursday February 16th, 2012 – 6:30 PM
Social/Networking Meeting
Location:Trali Irish Pub

Thursday January 19th, 2012 – 6:30 PM
First Triangle Analysts Social/Networking Meeting !
Location: Trali Irish Pub