The goals for this project were to train a convolutional neural network to be able to label game camera images taken in the North Country. Once trained the hope was to use it for Nature Up North projects that involve labeling thousands of images taken each year. Currently these images are labeled by volunteers and this network would be able to at least help sort them faster. At the conclusion of my project, I was able to train two networks – the first sorted images based on any animal presence, and the second sorted images based on animal species.
Ferritin is a kind of protein that exists in the human body with a mix of two different types of molecules called H and L. Diseases such as anemia, cancer, and Alzheimers have been linked to the malfunction of Ferritin in the body. Therefore, research is being conducted to further understand how Ferritin works. We have 2D images of Ferritin, which thanks to a software called Cryo-EM, can be combined to form 3D models of the protein. Unfortunately, the pictures that we have are blurry.
The research conducted for "You're a [Data] Wizard, Harry!" was done through the program of R. Through R, we started with text mining and cleaning of the first Harry Potter film script. From this script and data, we conducted many sentiment analyses for the course of the film and for each individual character. We then applied the sentiment analyses to networks, working to understand the sentiment of dialogue interactions and relationships. Through these analyses, we were able to see the overall connotations for each relationship's dialogue in the first film specifically.
In this research project, I analyzed and visualized the interaction between positionality and mental health portrayal in literature utilizing data science techniques coding with R. Specifically, I looked at about 40 YA fiction novels dealing with mental health issues and comparing those with black characters/authors to those with white characters/authors, as well as looking at differences in gender and the impacts of intersectionality.
Network analysis is the process of analyzing the structure of a network through graphical and statistical procedures. In other words, network analysis looks at how a set of units (nodes) are connected to one another through directional or non-directional links (edges). For my research project, I explored network analysis and investigated through statistical methods how the directional relationship of assists to goals influences winning percentages for Division III Liberty League soccer teams from 2013-2021.
Assessing Water Bottle Filling Station Use on St. Lawrence University Campus in Relation to Illness during the COVID-19 Pandemic
This project analyzed data from two surveys that I completed in Spring 2021 and Spring 2022 concerning the water bottle filling stations on St. Lawrence University campus. Our goal was to determine students' perception of the filling stations through various questions concerning their role in preventing single-use plastic bottles, as well as the filling stations' cleanliness, maintenance, and water quality.
Drones are uncrewed vehicles that operate under the direction of software running on a “flight computer” – a computer that controls the drone as it travels along a pre-programmed path. This research dealt with increasing the resiliency of the drone in the face of computer failures. We developed a method for transplanting certain sensor data from one flight computer to another, allowing a backup computer to take over the job of the flight computer while the problem is fixed.
How does the lottery drawing system work? Is it true that the winning numbers, as well as each individual digit, were chosen at random? This project uses R Studio to investigate two of New York’s most popular lottery drawing games to see if the numbers picked are as random as the Lottery Commission claims. This was achieved by applying Chi-Square of Goodness test to determine p-values and identify whether or not there is evidence that the numbers are not drawn randomly.
To what extent has the portrayal of mental health in English literature evolved over the past century? I utilized R to perform data science analyzation techniques on fiction novels about mental illness that range over the past century. The current final data includes the full analysis and visualizations from eight novels, but thanks to the successful completion of the R code template, many more books can be added as applicable. The final R document allows the uploading, cleaning, and formatting of the texts of the books.
The problem with “Tweets” is that sometimes it is unclear whether the words are actually what writers want to express. For example, a person called Anna K tweeted: On plus side look at the sky last night it was ablaze. The word “ABLAZE” here does not really mean “on fire.” It metaphorically suggests that the sky is so bright as if it is on fire, but it is certainly not a real disaster. Humans can easily identify the real meaning of this tweet. Our goal is to build a statistical model to allow a computer to predict if a Tweet is about an emergency.
The Greencafe website would allow the Greencafe staff to create events and would allow customers to reserve seats. This website would also allow customers to access the menus. Most importantly it allows the staff to manage all the reservations accordingly. They are also able to send invitations and reminders to their customers.
This project is to create computer generated music and to make comparisons between the two deep learning models. One is the LSTM (long short term memory) model and the other is the MuseGAN (generative adversarial network) model. MIDI and npz are two forms that are used in each model for datasets. MIDI is a form which people use to process music with programming because it can be easily read by computers. Npz files are able to group numbers into arrays, then save them in the files. It is a form made by numpy package in Python.
This project is an attempt to provide an analysis of the serve in professional tennis and its association with win probability. We completed extensive data exploration and wrangling to both find important trends and to shape the data into usable forms to build models. The final model is a Bradley-Terry model that focuses on how first serve percentage (percent of times the first serve is made) is associated with win probability for different players.
Exploring Mathematical Models of the Transmission of COVID-19 and the Efficacy of Different Management Strategies
This summer, I worked on a project titled “Exploring mathematical models of the transmission of COVID-19 and the efficacy of different management strategies” with my mentor, professor Rebecca Terry. My goal is to understand how coronavirus spreads within a population and explore how different factors affect transmission. Based on this exploration, I aim to consider how different management strategies may affect the spread of coronavirus through the population and compare the efficacy of different management strategies, such as quarantine and mask wearing.
The end result of this project is a Shiny web app which generates four different visualizations of musical data. The data source is the Spotify API, which is easily accessible and provides data for many different artists. The Shiny app allows the user to type in whichever artist they want. First, they will see a simple interactive dotplot of the valence of an artist’s body of work. Valence, a variable used in all of the plots, is a measure of musical turbulence, which allows the user to interpret if a song sounds stable or unstable.
Camera quality on smart phones has been improving rapidly along with facial recognition technology. Photo and video editing apps have become increasingly popular and sophisticated. As people use these apps for entertainment, this has also raised concerns about how fake but realistic looking videos may sow confusion, uncertainty, and doubt about the veracity of images. These manipulated videos and other digital representation produced by artificial intelligence are called “deepfake”.
The Office was a culturally influential American comedy show that followed the lives and office interaction of a group of paper company workers, both male and female. The original research questions for the project were: “What is the balance of gender dialogue for each season and the episodes within each season?” and “What is the balance of character complexity for men and women throughout the series?” By exploring these topics through the lens of this show, the goal was to identify possible similar trends of gender representation across the entertainment industry as a whole.
Football professionals are always concerned about injury due to the high level of contact within the sport, but non-contact injuries are often overlooked. This summer, I attempted to investigate the relationship between non-contact injuries and their relation to the playing surface a player was injured on. The data was provided by the NFL on a site called Kaggle, which is where companies will put out their data for data scientists to analyze. The data was provided in three separate datasets, which would have to be combined to make a complete analysis.
The Admissions Office at St. Lawrence University reaches out to students through email campaigns throughout their application process for reminder, yields and general communication. Understanding the effectiveness of an email campaign and its reach allows outreach teams within the office to develop content more tailored to the needs of the recipients. The project consists in creating a data visualization web application that takes data from the Admissions and Financial Aid Customer Report Management System to create graphical representations of campaign metrics.
During my fellowship, I studied crossing numbers of graphs, which are the number of times edges in a graph cross in a drawing of that graph. This idea was first worked on in 1944 by Paul Turán and one way it is applied is to plan roads in a city or rails in a storage yard. Most crossing number work done has been with complete (Anthony Hill) or complete bipartite (Kazimierz Zarankiewitz) graphs. I mainly worked with the crossing number of the generalized hypercube graph, focusing on the rectilinear crossing number.
The air pollution is the most serious problem in China, and people from other countries only know that environmental problem, but don’t understand its trend and compositions. I wanted to find the trend of smog and using a statistical model to see the trend and correlation. What I enjoyed the most this summer were the daily meetings with Professor Ramler as we explored the new methods for graphing different kinds of data, which I never learned in class, and dealing with the questions I had.
Though each discipline tends to be regarded as the antithesis of the other, mathematics and art intersect often and with fascinating results. This junction appears notably in the works of M.C Escher, a Dutch artist who, despite his incredulity in having any mathematical prowess, developed his own ideas of plane division which would appear in his tessellations. These tessellations would inspire his interest in what mathematicians call plane crystallographic or wallpaper groups, which are classifications of wallpaper patterns, or two dimensional repetitive patterns.
Clash Royale is a real-time strategy video game that allows two players to “battle” with their decks—a combination of eight cards, each associated with a level that can be increased through upgrades. Our goal was to investigate the most effective card-upgrade strategy across different decks while also taking into account the in-game currency required for upgrades.