Yifan Li
Address: 444 Washington Blvd, Jersey City, NJ,
07310
Phone: +1 (608)-216-5993
Email:
ivanlee142857@gmail.com
Education
University of Connecticut, Connecticut, USA
PhD of Science, Statistics (Sept 2018 – Sept 2023)
Coursework: Analysis of Survival Data, Bayesian Data Analysis, Computational Method for Optimization, Financial Data Mining, Bayesian Decision, Applied Multivariate Analysis, Linear Statistical Model
University of Wisconsin-Madison, Wisconsin, USA
Master of Science, Statistics (Sept 2016 – May 2018)
Major GPA: 3.87/4 | Overall GPA: 3.77/4
Coursework: Survival Analysis, Stochastic Modeling, Classification and Regression Tree, Statistical Method, Mathematical Statistics, Machine Learning, Multilevel Models, Design of Experiments
Nanjing University, Jiangsu, China
Bachelor of Science, Statistics (Sept 2013 – May 2017)
Coursework: Mathematical Analysis, Higher Algebra, Discrete Mathematics, Ordinary Differential Equation, Partial Differential Equation, Function of Complex Variable, Stochastic Process, Real Analysis
Award: People Scholarship
Professional Experience
Quantitative Trading Book in Ernst & Young U.S. LLP, New York, USA
Senior Consultant (Oct 2023 – Present)
-
Modular Redesign of Derivatives Pricing Algorithm
- Led the architectural overhaul by decomposing the algorithm into service class and separate analysis units, achieving high decoupling of code.
- Enabled independent updates to each component without affecting the overall system, significantly reducing redundancy and enhancing maintainability.
- Designed and implemented robust unit testing frameworks, improving system debug reliability by proactively identifying potential errors.
-
Optimization of American Options Pricing
- Applied the American Monte Carlo (AMC) method to price American options, replacing the computationally intensive Monte Carlo over Monte Carlo method.
- Achieved a substantial reduction in computational complexity from O(n²) to O(n), cutting pricing time and saving considerable computational resources.
-
Equity Derivatives Pricing Algorithm Enhancement
- Improved the pricing framework for equity derivatives by transitioning from a market-based risk model to an underlying location-based risk analysis, enhancing accuracy and interpretability.
- Integrated advanced machine learning techniques, such as LSTM, random forest models with traditional MCMC methods to price derivatives, enabling the pricing of complex toxic options with more than three underlying assets.
-
Counterparty Credit Risk Monitoring
- Employed SFT VaR-based models to calculate and monitor Counterparty Credit Risk.
- Interpreted complex data and model results, delivering clear insights to stakeholders, including cross-disciplinary teams and non-technical audiences.
- Regularly updated model parameters in line with evolving market data, ensuring the models reflect current market conditions and deliver accurate risk assessments.
Bank of China International Holdings Limited, Shanghai, China
Securities Analyst Assistant (Intern) (June 2021 – Sept 2021)
- Focused on battery and new energy industry. Predicted the short- and long-term performance of stocks of related companies based on time series model with a spike-and-slab error.
- Adjusted the prediction under a multinomial model based on the performance of correlated companies and avoided making an over-optimistic forecast compared with previous model.
Huatai Securities Co., Ltd., Jiangsu, China
Data Analyst (Intern) (July 2017 – Sept 2017)
- Unsupervised screened visitors with a strong desire to buy products based on their records on company’s APP.
- Cleaned and reshaped the 17 million visitor records by summarizing operations from the same visitor.
- Extracted useful variables by PCA (principal component analysis) method.
- Divided visitors into five groups by K-means methods and assigned visitors labels by their group.
- Fitted a decision tree with labeled data which could tag new visitor within 20 seconds while the target is 1 min.
University of Connecticut Statistical Consulting Group, Connecticut, USA
Project Leader (Sept 2020 – Oct 2023)
-
Comparison Cooling Methods Effects through Cluster
Analysis
- Compared the effects of different cooling methods on body temperatures and heart rates with time series response.
- Standardized units and erased individual differences by using ratios of dependent variables.
- Grouped data by K-means method and paired them with prior information.
- Relabeled dependent variables based on cooling and heating procedures to avoid confounding and redesigned hypothesis tests to avoid drawing opposite conclusions.
-
Recommendation of Artificial Leg with Feature Extraction
Method
- Recommended the best type of artificial leg for certain patients based on physiological and psychological tests.
- Applied PCA transformation on highly linearly correlated psychological test variables to find interpretable criteria.
- Used Lasso method to select physiological tests, helping clients avoid wasting funds on unnecessary tests.
- Compared with XGBoost model; our model is interpretable while sharing similar prediction accuracy.
-
Credit Card Approval with Unbalanced Data and Outliers
- Decided who to approve or decline for credit based on historical repayment records.
- Added new missing indicator variables before applying imputing methods after checking randomness.
- Generated features based on the distribution of outliers and assigned different weights to unbalanced responses.
- Fitted logistic regression, XGBoost, and Random Forest models separately and used the linear combination of three models as the final model after cross-validation.
-
Yelp Reviews Rating Prediction
- Predicted Yelp reviews’ rating on 1 million unlabeled text reviews.
- Cleaned 1.5 million Yelp reviews by removing non-English comments, abbreviations, and spelling mistakes.
- Extracted positive/negative words based on their relative frequency in differently rated reviews to avoid over-weighting everyday words like “the” or “a.”
- Converted text reviews into vectors using Sentence-To-Vector and generated new features from positive/negative words.
- Fitted pre-processed data using a Long-Short-Term-Memory (LSTM) neural network and achieved a 0.6 root-mean-square-error.
Doctoral Thesis
-
Item-Response-Theory Model:
- Estimated individual’s ability and item’s difficulty based on their performances on several.
- Adapted logistical regression model by Item-Response-Theory model with a power parameter which can control the skewness of link function.
- Combined Sliced sampling and Gibbs sampling method (MCMC) to get estimations of interested variables.
- Reduced the prediction error by half compared with normal logistical regression model.
-
Joint Model of Item Response and Response Time:
- Estimated individual’s ability based on both item response (IR) and response time (RT).
- Fitted separate logistic and linear regression for IR and RT. Combined them with a nonparametric Dirichlet Process prior on individual’s ability which get rid of normality assumption of variables.
- Estimated individual’s ability by Hamiltonian Monte Carlo and clustered individuals by patterns from Dirichlet Process.
-
Joint Model of Longitudinal Item Response and Survival
Time:
- Examined trend of individual’s ability over time and their effects on response time.
- Individual’s ability was taken as longitudinal and estimated by forward and backward forecasting method.
- Response time was fitted as a Cox proportional hazards model through partial likelihood method which is a semiparametric approach.
- All unknown parameters are estimated by stochastic gradient descent algorithm.
Skills
- Languages: Mandarin Chinese (Native), English
-
Coding/Database:
- Proficient in R, Python, GitHub, LaTeX, Nimble, JUGS, HPC.
- Familiar with SQL, SAS, MATLAB, C++, Julia.
- Certification: CFA Level 1