Skip to main content

Using R Studio to Explore 40 Years of Winning Results: The Statistics Behind the New York Lottery

By

How does the lottery drawing system work? Is it true that the winning numbers, as well as each individual digit, were chosen at random? This project uses R Studio to investigate two of New York’s most popular lottery drawing games to see if the numbers picked are as random as the Lottery Commission claims. This was achieved by applying Chi-Square of Goodness test to determine p-values and identify whether or not there is evidence that the numbers are not drawn randomly.

Sentimental Analysis

By

To what extent has the portrayal of mental health in English literature evolved over the past century? I utilized R to perform data science analyzation techniques on fiction novels about mental illness that range over the past century. The current final data includes the full analysis and visualizations from eight novels, but thanks to the successful completion of the R code template, many more books can be added as applicable. The final R document allows the uploading, cleaning, and formatting of the texts of the books.

Detecting Real Disasters through Twitter

By

The problem with “Tweets” is that sometimes it is unclear whether the words are actually what writers want to express. For example, a person called Anna K tweeted: On plus side look at the sky last night it was ablaze. The word “ABLAZE” here does not really mean “on fire.” It metaphorically suggests that the sky is so bright as if it is on fire, but it is certainly not a real disaster. Humans can easily identify the real meaning of this tweet. Our goal is to build a statistical model to allow a computer to predict if a Tweet is about an emergency.

GreenCafe Website

By

The Greencafe website would allow the Greencafe staff to create events and would allow customers to reserve seats. This website would also allow customers to access the menus. Most importantly it allows the staff to manage all the reservations accordingly. They are also able to send invitations and reminders to their customers.

Creating a “Musician” by Deep Learning

By

This project is to create computer generated music and to make comparisons between the two deep learning models. One is the LSTM (long short term memory) model and the other is the MuseGAN (generative adversarial network) model. MIDI and npz are two forms that are used in each model for datasets. MIDI is a form which people use to process music with programming because it can be easily read by computers. Npz files are able to group numbers into arrays, then save them in the files. It is a form made by numpy package in Python.

An Analysis on the Importance of the Serve in Professional Tennis

By

This project is an attempt to provide an analysis of the serve in professional tennis and its association with win probability. We completed extensive data exploration and wrangling to both find important trends and to shape the data into usable forms to build models. The final model is a Bradley-Terry model that focuses on how first serve percentage (percent of times the first serve is made) is associated with win probability for different players.

Exploring Mathematical Models of the Transmission of COVID-19 and the Efficacy of Different Management Strategies

By

This summer, I worked on a project titled “Exploring mathematical models of the transmission of COVID-19 and the efficacy of different management strategies” with my mentor, professor Rebecca Terry. My goal is to understand how coronavirus spreads within a population and explore how different factors affect transmission. Based on this exploration, I aim to consider how different management strategies may affect the spread of coronavirus through the population and compare the efficacy of different management strategies, such as quarantine and mask wearing.

Music and Emotion in Shiny

By

The end result of this project is a Shiny web app which generates four different visualizations of musical data. The data source is the Spotify API, which is easily accessible and provides data for many different artists. The Shiny app allows the user to type in whichever artist they want. First, they will see a simple interactive dotplot of the valence of an artist’s body of work. Valence, a variable used in all of the plots, is a measure of musical turbulence, which allows the user to interpret if a song sounds stable or unstable.

Using Deep Learning to Detect Fake Images

By

Camera quality on smart phones has been improving rapidly along with facial recognition technology. Photo and video editing apps have become increasingly popular and sophisticated. As people use these apps for entertainment, this has also raised concerns about how fake but realistic looking videos may sow confusion, uncertainty, and doubt about the veracity of images. These manipulated videos and other digital representation produced by artificial intelligence are called “deepfake”.

Statistical Analysis of the Representation of Women in the US Television Show, the Office

By

The Office was a culturally influential American comedy show that followed the lives and office interaction of a group of paper company workers, both male and female. The original research questions for the project were: “What is the balance of gender dialogue for each season and the episodes within each season?” and “What is the balance of character complexity for men and women throughout the series?” By exploring these topics through the lens of this show, the goal was to identify possible similar trends of gender representation across the entertainment industry as a whole.

Injuries and Playing Surface in the NFL

By

Football professionals are always concerned about injury due to the high level of contact within the sport, but non-contact injuries are often overlooked. This summer, I attempted to investigate the relationship between non-contact injuries and their relation to the playing surface a player was injured on. The data was provided by the NFL on a site called Kaggle, which is where companies will put out their data for data scientists to analyze. The data was provided in three separate datasets, which would have to be combined to make a complete analysis.

Data Visualization for the Admissions Office

By

The Admissions Office at St. Lawrence University reaches out to students through email campaigns throughout their application process for reminder, yields and general communication. Understanding the effectiveness of an email campaign and its reach allows outreach teams within the office to develop content more tailored to the needs of the recipients. The project consists in creating a data visualization web application that takes data from the Admissions and Financial Aid Customer Report Management System to create graphical representations of campaign metrics.

Special Cases of Crossing Numbers

By

During my fellowship, I studied crossing numbers of graphs, which are the number of times edges in a graph cross in a drawing of that graph. This idea was first worked on in 1944 by Paul Turán and one way it is applied is to plan roads in a city or rails in a storage yard. Most crossing number work done has been with complete (Anthony Hill) or complete bipartite (Kazimierz Zarankiewitz) graphs. I mainly worked with the crossing number of the generalized hypercube graph, focusing on the rectilinear crossing number.

Creating a Statistical Model for Evaluating the Concentration of Smog in China

By

The air pollution is the most serious problem in China, and people from other countries only know that environmental problem, but don’t understand its trend and compositions. I wanted to find the trend of smog and using a statistical model to see the trend and correlation. What I enjoyed the most this summer were the daily meetings with Professor Ramler as we explored the new methods for graphing different kinds of data, which I never learned in class, and dealing with the questions I had.

Mathematical Art and Artistic Mathematics

By

Though each discipline tends to be regarded as the antithesis of the other, mathematics and art intersect often and with fascinating results. This junction appears notably in the works of M.C Escher, a Dutch artist who, despite his incredulity in having any mathematical prowess, developed his own ideas of plane division which would appear in his tessellations. These tessellations would inspire his interest in what mathematicians call plane crystallographic or wallpaper groups, which are classifications of wallpaper patterns, or two dimensional repetitive patterns.

Investigate the Most Effective Card-Upgrade Strategy in Clash Royale

By

Clash Royale is a real-time strategy video game that allows two players to “battle” with their decks—a combination of eight cards, each associated with a level that can be increased through upgrades. Our goal was to investigate the most effective card-upgrade strategy across different decks while also taking into account the in-game currency required for upgrades.