Alan Smith

Linsi Lin

Education

San Francisco State University

San Francisco, California USA

Master of Science in Statistical Data Science   Spring 2020-Fall 2021
Master of Science in Quantitative Economics   Spring 2018-Spring 2021
GPA 3.98/4.0

Fujian Normal University

Fuzhou, China

Bachelor of Arts (B.A) in English   Fall 2010-Spring 2014


Work Experience

Data Scientist II, GEICO, Chevy Chase, Maryland

01/2023 - Now

• Spearheaded the development of innovative machine learning and AI initiatives, including the creation of an intelligent knowledge assistant bot, next-best-action (NBA) recommendation system, and real-time customer intent detection and entity extraction models (RIDE) for GEICO’s customer-facing virtual assistant. These contributions were pivotal in realizing over $20 million in net benefits and cost savings.
• Leveraged a diverse array of machine learning methodologies, including natural language processing (NLP) and generative AI for conversational applications, churn prediction models, tenure regression analysis, and clustering for in-depth customer profiling, as well as utilizing reinforcement learning for dynamic strategy optimization, among other techniques.
• Demonstrated exceptional dedication and efficiency by working closely with cross-functional teams—comprising data scientists, engineers, and analysts—to deliver the RIDE project a month ahead of a stringent deadline. Concurrently, achieved a 97% accuracy rate in customer intent detection using NLP and classification methods, significantly outperforming third-party benchmarks of 85%. This dual success in delivering faster and more accurate solution facilitated significant cost savings of approximately $1.5 million.
• Designed and executed automated data pipelines to support model training, scoring, monitoring, and backtesting, ensuring the scalability and efficiency of machine learning operations across the board.
• Provided expertise and technical assistance across projects, leveraging in-depth knowledge of insurance domain, infrastructure, and technical aspects to support the team, enhancing project outcomes and operational efficiency.
• Fostered strong collaboration with various departments including Machine Learning Orchestration (M20), Machine Learning Operations (MLOps), Marketing, Product, and IT, guaranteeing seamless integration and delivery of machine learning projects.
• Actively engaged with stakeholders through regular updates, leveraging business insights to inform project direction. Delivered clear, accessible presentations and conducted comprehensive cost-benefit analyses to align machine learning initiatives with GEICO's strategic goals and maximize project value.

Data Scientist I, GEICO, Chevy Chase, Maryland

10/2021 - 01/2023

• Contributed to the development of the Customer Lifetime Value project, which encompassed 11 models tailored to four key customer segments, employing models focused on tenure, premium, and closure—with one segment not involving closure. This effort steered marketing strategies and optimized paid search investments by aligning ad spending with precise customer value predictions, significantly boosting GEICO's profits by tens of millions of dollars.
• Executed in-depth exploratory data analysis (EDA) to derive actionable insights, steering data-driven decisions.
• Contributed to feature development focused on customer web behavior, insurance policy, claim history, communication history, risk profile, and more, crafting a multifaceted customer profile that facilitated accurate prediction of customer's lifetime value.

Data Scientist Intern, EMCOR Group, Remote

05/2021-10/2021

• Transformed business needs into data science solutions, streamlined data handling using SQL, and enhanced data quality.
• Leveraged Tableau and R Plotly for insightful visualizations, employed causal inference in STATA to pinpoint drivers of customer complaints, and utilized advanced forecasting techniques (including ETS, ARIMA, Prophet, XGBoost) to predict complaints across 8000+ locations, reducing escalation rates by 36%. Continually optimized models through feature engineering and hyperparameter adjustments.
• Developed a dynamic Tableau dashboard for real-time analytics access, and effectively communicated findings to all management levels.

Office Assistant, Bernstein Realty, San Francisco, CA

12/2015-07/2017

• Greeted visitors, answered phones, replied to emails, sorted and sent mail.
• Responded to tenant inquiries and requests, processed work orders and followed through, coordinated tenant move-ins and move-outs.
• Worked on other projects as assigned, such as accounting and bookkeeping.

Translator, Hawkshield, Fresno, CA

02/2015-08/2015

Chinese-English verbal and written translation.

Translator, Skilogik, Sanya, Hainan Province, China

03/2014-01/2015

Chinese-English verbal and written translation.

Projects

Image Classification and Recommender System

• Scrapped data from Williams Sonoma websites and did the image classification by convolutional neural network for 10 categories of products, achieving about 89% accuracy.
• Built image-based recommender system and title-based recommender system using transfer learning, feature extraction, and natural language processing. Early exploration of image matching applications using artificial intelligence in the online shopping field and added visual discovery to the traditional search paradigms.

Wish Summer Sales Prediction

• Sales prediction for E-commerce platform Wish, minimizing mean absolute error within 1,150 units given top 6 sales being 100,000 units.
• Conducted interactive data visualization and statistical inference.
• Implemented 3 model selection methods, namely backward elimination, recursive feature elimination, lassoCV, and 13 machine learning algorithms and compared the model performances.
• Pinpointed crucial characteristics for high sales in Wish.

Infant Birth Weight Prediction

• Prediction of infant birth weight using 13 linear and non-linear regression algorithms such as Ridge and Lasso Regression, PCA, Bagging, and Boosting in a large dataset with 101,399 observations and 24 predictors.
• Accurately predict infant birth weight with a mean square error within one pound.
• Conducted model assumptions check for heteroscedasticity, collinearity, and skewness by statistical tests.
• Illustrated variables significance on infant weight prediction.

Store Database System

• A store database system that allows both company and buyers to input and retrieve information from the database.
• Started from business use cases, drew an entity-relationship diagram, designed the relational model, and created a real application in Python.

Income Classification

• This project aims to classify people’s income categories with a threshold of $50K based on various demographic characteristics.
• Troubleshot data imbalance problem by weighted machine learning algorithms, data resampling, and adopted different evaluation metrics such as precision, recall, f1 score, AUC.

Key Skills

  • Microsoft Azure
  • AML
  • ADO
  • Databricks
  • AKS
  • Python
  • R
  • SQL
  • Java
  • Tableau
  • PowerBI
  • Streamlit
  • STATA
  • MATLAB
  • Econometrics
  • Time Series
  • Scikit-learn
  • TensorFlow
  • PyTorch
  • RL
  • BERT
  • NLP
  • LangChain
  • Deep Lake
  • Generative AI
  • Git
  • CI/CD
  • A/B testing
  • Backtesting
  • Azure DevOps
  • Slack
  • Microsoft Teams
  • Dimensionality Reduction
  • Confluence
  • Agile & Scrum
  • Anomaly Detection
  • Recommender Systems

Core Courses

  • Theory and Applications of Statistical and Machine Learning
  • Pattern Analysis and Machine Intelligence
  • Probability and Statistics
  • Design and Analysis of Experiments
  • Computational Statistics
  • Categorical Data Analysis
  • Advanced Probability Models
  • Multivariate Statistical Methods
  • Advanced Econometric Methods and Applications
  • Data Mining
  • Applied Time Series Econometrics
  • Mathematical Economics
  • Microeconomics Theory
  • Macroeconomics Theory
  • Database System