New York City Crime Analysis
Abstract Summary
Schools reduce all types of crime in close proximity.
In terms of absolute crime, Staten Island was, is, and is predicted to be the safest borough in the near future.
In terms of decreasing crime rate, Queens will be the safest borough in near future.
Precincts 17, 50, 66, 100, 111, 112, and 23 will be safest in the near future.
Introduction
This report aims to answer the question: Where is the safest place to live in New York City?
We, the authors of this report, are relocating to New York City (NYC) in the summer of 2024 to seek employment opportunities. Therefore, the findings in this report are of deep importance and relevance to us and anyone else who plans to move to the Big Apple in the near future. Safety and security are fundamental needs for every individual.
Background Info
To answer the question, place and safety must be defined.
Place is classifed by borough and precinct, the former more general than the latter. NYC is divided into five boroughs: Manhattan, Brooklyn, Queens, the Bronx, and Staten Island. Precincts are geographical divisions within the city, each overseen by the New York City Police Department (NYPD). The NYPD currently operates 77 precincts.
Safety is measured by crime frequency and crime rate. Crimes are classified as felonies, misdemeanors, violations, or infractions, spanning from severe felonies like homicide to minor infractions such as speeding.
The data in this report is courtesy of NYC Open Data, Census Surveys, and EquityNYC.
Overview
Each offense is plotted below, grouped by the age range of the criminal.
Violations and infractions have significantly decreased from 2006 to the point where they are of little concern. Felonies have remained relatively leveled at around 90,000, with the uptick in 2007 and the downtick in 2020 only temporary. This indicates that felonies are relatively inert and are a persistent aspect of crime in NYC, resistant to any reduction efforts.
The crime of most importance are misdemeanors. Misdemeanors make up the highest proportion of crimes and there is a consistent downtrend until 2020, from which there is a substantial uptrend. What is remarkable is how the proportion of adolescents (<18) committing misdemeanors consistently decreased from a high of 10.08% in 2010 to a mere 2.22% in 2022. The proportion of young adults (18-24) committing misdemeanors also consistently decreased from a high of 28.71% in 2011 to a low of 15.97% in 2022. The other three age ranges seem to have absorbed this decrease judging by their subtle increases in proportions over the years.
This seems to attributed to more schools being opened across NYC. “[Schools decrease] the opportunities for [youth] to engage in nuisance crimes” through education, support, and preventive measures (Sandi, 2023). Hence, the next section delves into the impact of schools on crime.
School Analysis
Common knowledge emphasizes that more schooling for youth correlates with declining juvenile delinquency rates. The graph below explores this notion by aggregating crime in a 500-meter radius across all schools in NYC. The 500-meter radius was chosen to ensure comprehensive coverage of the areas surrounding each school while avoiding overlap between neighboring schools. Crime data is analyzed from seven years before the schools’ opening to seven years after, capturing the medium to long-term impact of new schools on close proximity crime.
Misdemeanors exhibit the most pronounced absolute delta among all four offenses, with a peak just above 100,000 occurrences decreasing to a low slightly above 30,000. While the other three offenses are just as noteworthy, they are overshadowed by the sheer volume of misdemeanors. Isolating each offense using Plotly aids in visualizing the abrupt decline in crime, which occurs slightly before or at the moment schools open and persists until the end of the seven year time range. The only anomaly in this graph is the unusual chop in infractions before schools open. Since infractions are quite minor (e.g., improper parking), they are not worth exploring. What is worth exploring is the year over year crime change across these four offenses.
The anomaly regarding infractions is likewise present in the crime change graph. Regardless, the rate of change of all four offenses is always negative after schools open (felonies have a 1% uptick two years after schools open but this is negligible). The long-term impact of schools opening is double-digit decreases in crime rates across infractions, misdemeanors, and violations. Felonies experience high single-digit decreases. As per the overview section, felonies exhibit sticky behavior year over year and are less influenced by external factors. To assess the strength of this overall decrease in crime rate due to schools, comparing the Pearson correlation coefficient and evaluating the goodness of fit using R² in a linear model is necessary. Only the years after schools open are considered (years 0 to 6).
Offense | Correlation Coefficient | \[R^2\] | Slope Value |
---|---|---|---|
Felony | -0.88 | 0.78 | -0.016 |
Infraction | -0.90 | 0.81 | -0.054 |
Misdemeanor | -0.94 | 0.89 | -0.030 |
Violation | -0.65 | 0.42 | -0.070 |
All offenses have strong negative Pearson correlation coefficients less than -0.5. Hence, strong negative linearity is apparent. Additionally, all offenses except violations have strong R² values greater than 0.5; the linear regression effectively explains the variance in the change in felonies, misdemeanors, and infractions. Therefore, it is safe to conclude that the year over year change in felonies, misdemeanors, and infractions accelerates at -1.6%, -3.0%, and -5.4% respectively. The impact of this acceleration is visualized below across the entire time range.
The final observation supports the claim that new schools significantly reduce all types of crime in close proximity.
Time Series Analysis
This section of the report aims to determine the safest borough based solely on a forecast derived from month-over-month crime data.
In terms of absolute crime count, Staten Island is by far the safest borough, followed by Queens, the Bronx, Manhattan, and Brooklyn, respectively. This order is maintained throughout the years, except during black swan events such as the COVID-19 pandemic in 2020; the order is quickly regained shortly after. Regardless, to determine which borough’s crime rate is decreasing the most requires further analysis.
Both graphs above illustrate how all five boroughs appear to be correlated to a strong degree. A correlation matrix based on crime change quantifies this.
All values are greater than 0.5, indicating strong correlation. However, the highest value is between adjacent boroughs of Queens and Brooklyn at 0.75. Hence, there still seems to be some minor differences in crime rate.
To explore these differences, a forecast comparison using additive decomposition of the time series will be conducted. The first step is to determine if there is a seasonality component in crime change.
All the ACF plots exhibit strong evidence of seasonality, especially on a year-over-year basis. The notable autocorrelations observed at lags of 12, 24, and 36 months further reinforce this finding.
Crime rates seem to rise in January, March, May, and October and fall in February, April, November, and December. More generally, crime rises leading up to the summer months and falls leading up to the winter months. Therefore, it is imperative to decompose the time series into its trend, seasonal, and remainder components to conduct an accurate decomposition forecast.
The time series decomposition clearly depicts the seasonal component and a strong trend component. There also appears to be a significant noise component after 2020. This once again aligns with the COVID-19 global pandemic.
Time Series Forecast
A tidy machine learning workflow is employed for training and testing the time series model. Specifically, the data is split into a training set and a testing set. The training set comprises data prior to 2019, while the testing set covers the year 2019. The cutoff at 2019 is due to the latter years displaying greater volatility and erratic behavior due to the COVID-19 global pandemic. Hence, they are omitted from the test set. Following this, the seasonal naive ARIMA model yields the following results on the test set.
Borough | ME | RMSE | MAPE |
---|---|---|---|
Bronx | -0.016 | 0.074 | 59.38 |
Brooklyn | -0.0018 | 0.060 | 54.07 |
Manhattan | 0.0018 | 0.062 | 67.41 |
Queens | -0.0030 | 0.054 | 46.33 |
Staten Island | -0.021 | 0.13 | 72.54 |
The MAPE and RMSE values indicate that the model performed adequately; there will always be some natural variations due to external factors outside the scope of the time series. Therefore, a forecast till the end of 2024 is conducted with moderate confidence.
The forecast is difficult to interpret given the oscillatory nature of crime. Additionally, since crime is seasonal, a point forecast does not offer much insight. Instead, the cumulative crime change from January 2023 till December 2024 shines light on which borough is forecasted to show the greatest decrease in crime rate.
Borough | Cumulative Crime Change Forecast |
---|---|
Bronx | 0.99 |
Brooklyn | 2.34 |
Manhattan | 1.25 |
Queens | 0.28 |
Staten Island | 0.91 |
The forecast suggests that crime in Queens will decrease the most, followed by Staten Island, the Bronx, Manhattan, and Brooklyn.
Multivariate Crime Analysis
Based on the analysis above, crime rate is influenced by schools and has a cyclical component to it. To further this analysis, the crime rate is compared with a variety of relevant factors. According to the Government of Canada, “social and economic disadvantage has been found to be strongly associated with crime, particularly the most serious offences including assault, robbery, and homicide.” Therefore, median income, poverty rate, and unemployment rate, alongside school count and time are considered.
The correlation matrix indicates a moderate absolute correlation between crime, income, and unemployment. However, the remaining variables exhibit weak absolute correlations. This relationship is visually represented in the heatmap for each factor, utilizing data from 2015. Specifically, regions with higher income, lower poverty rates, and lower unemployment tend to experience fewer crimes, and vice versa.
It is crucial to recognize that when evaluating these variables collectively within a multivariate model, collinearity issues may emerge among them. Using the variance inflation factor, variables are iteratively added and an abnormally large value regarding unemployment is observed. Hence, unemployment is disregarded and the variance inflation factors for the remaining variables are shown below.
Variance Inflation Factor Analysis | |||
Income | Poverty | School Count | Year |
---|---|---|---|
1.383294 | 1.300938 | 1.085032 | 1.031472 |
One important point to highlight is the omission of the precincts variable from the correlation matrix due to its nature as a nominal categorical variable. Put simply, precinct values do not imply any inherent rank nor order. Nonetheless, precincts remain a factor of consideration in the following forecasting model as it provides geographical insight.
Multivariate Forecast
A tidy machine learning workflow is employed for training and testing the model. Specifically, the data is split into a training set and a testing set. The training set comprises data from 2010 to 2015, while the testing set covers the period from 2016 to 2017. The cutoff at 2017 is due to the limited availability of school data until then. Following this, the variables are preprocessed (normalized and dummified) before training the model. The multivariate linear regression model yields the following results on the test set.
Model Evaluation | |
Metric | Value |
---|---|
Mean Absolute Percentage Error | 21.7692544 |
Root Mean Squared Error | 1309.2306350 |
R-Squared | 0.8964123 |
Mean Absolute Error | 881.7776689 |
Above is a side-by-side comparison of the predicted crime and the actual crime. Visually, the model performs well at predicting the crime rate for each precinct. Evaluation metrics below suggest that the model can explain a significant portion of the variance in the data and make predictions with relatively low error, indicating a good fit.
Given the strong performance of the model, a forecast for the 2025 crime rate was conducted using a two-step inference model. Initially, values for all dependent factors were forecasted for 2025. This includes predictions for income, poverty, and school count for each precinct. Subsequently, these estimates were incorporated into the model to predict the crime rate for each precinct for 2025. The forecast is captured below.
The forecast indicates that certain areas are definitely safer than others. Given budgetary constraints, prioritizing these safer areas is recommended.
Conclusion
Based on the analysis above, we can use our forecasts to determine a safe area in NYC to consider relocating to. The general rule of thumb is to pick a location near a school if possible. If choosing a borough is possible, Staten Island is the safest option. Queens is also a great option since it is projected to show the greatest decrease in crime in the near future.
Based on the map above, there are around a dozen areas which have relatively low crime. These areas are fairly spread apart, allowing us to have multiple safe areas in each borough. Some safe areas not too secluded from the city centre include Murray Hill, Cobble Hill, Greenpoint, and Upper West Side. More areas can be found using the interactive map above, with green sectors highlighting the safest neighborhoods.