background

Hello

profile-pic

I'm Wanyun,
a data scientist.

Education


University of Maryland

Master of Information Systems

* Outstanding Graduate Project Award in DC Data Challenge 2020

* Smith Master Student Association Ambassador

* Terrapin Scholar Award

📍

College Park, MD

📆

Dec 2020

University of Maryland

Bachelor of Statistics & Applied Economics

* Scholarship for Excellent Academic Performance

📍

College Park, MD

📆

May 2019

Udacity

nano-degree: Machine Learning Engineer

* In progress

Work Experience


Heyday

Analyst, Advanced Analytics

...

📍 Boston, MA

📆 August 2021 - Present

University of Maryland

Research Assistant in Text Mining & Teaching Assistant

• Scraped 3000+ announcements from 98 websites in 15 categories using Python BeautifulSoup, conducted exploratory analysis on COVID-19 response trends using word embeddings with R Glove & Word2Vec, topic modeling with Seeded LDA, and sentiment analysis with dictionaries like BING, AFINN & NRC lexicons & visualized the insights with word clouds, scatter & bar charts.

• Created interactive presentations for the Data Mining class using R Xaringan & Flipbookr packages, presented to 200+ students.

• Graded 40+ assignments for Pricing and Revenue Management MBA class, provided detailed feedback with recommendations.

📍 College Park, MD

📆 November 2020 - March 2021

Center for Health Information & Decision Systems

Machine Learning Researcher

• Processed 200k records of pharmacy sales data for the past 2 months, imputed missing data, treated outliers & performed EDA.

• Predicted next week's drug sales for 5 pharmacies, trained & evaluated Support Vector Regressor, Random Forest, & Gradient Boosting models using Sklearn in Python, achieved lowest MAE of 19 with Gradient Boosting model.

📍 College Park, MD

📆 January 2020 - May 2020

Bank of China

Business Analyst Intern

• Boosted the speed of clients’ data extraction by 1.5X times, with optimized database schema using SQL queries.

• Researched the financial markets & prepared a market analysis report that was published in the State Journal of Banking & Finance.

📍 Xi'an, China

📆 December 2017 – January 2018

Skills


Tools:

Python | R | SQL | Tableau | Power BI | Looker | Git | AWS | Spark | Hadoop | ProjectLibre | Arena | HTML | CSS | JavaScript

Certifications:

Tableau Desktop Specialist, AWS Cloud Practitioner(In progress)

Projects


Python: Sentiment Analysis of Amazon Reviews by Using Random Forest and Recurrent Neural Network (LSTM)

• Cleansed, & preprocessed 5269 Amazon beauty product reviews, stemmed & tokenized keywords & built TF-IDF matrix with NLTK.

• Predicted customer sentiments using Random Forest & LSTM from Sklearn & PyTorch packages in Python & achieved 94% & 90% accuracies respectively.

Details

R: Explanatory Model and Predictive Model for Achieving High Airbnb Booking Rate

• Analyzed 20+ Airbnb property features correlated with high-profit conversion rates, provided actionable insights & recommendations.

• Collected geolocation, demographic & public infrastructure resource data to capture potential community features.

• Applied Logistics & Lasso regressions, Tree Ensemble models using Caret & Tidymodels libraries in R, tuned hyper-parameters and achieved highest specificity of 97.55% with a Random Forest model at a 0.69 threshold on Receiver Operating Characteristic curve.

• Generated the Variable Importance Plot & quantified the impact of each significant feature on the high booking rate.

Details

Python, R: Air Quality Analysis and prediction in DMV Area

• Analyzed unusual historical changes of air quality data by combining the sensor records, geolocation, and demographic data.

• Decomposed time-series data, built predictive model, and visualized it by using Fpp and Predict packages.

Details

Python: NYC Taxis Trips Analysis - Over 1 Million Records

• Analyzed and interpreted the relationship between different factors and trip durations.

• Visualized and captured the trends of peak hours on weekend and weekdays.

• Mapped popular locations and analyzed the deeper reason behind it.

Details


Python: Pokemon Game Analysi by Web Scraping and Data Aggregation

• Scraped pokemons' information from official website by using Beautiful Soup package.

• Analyzed the power of pokemons by different types and locations in the pokemon XY game.

Details

SQL, Lucidchart: Information System Design and Implementation for Silver Oaks Cooperative School

• Collected information from 17 documents, designed the ER Diagram with 5 tables using Lucidchart, determined functional dependencies, performed normalization, defined the business rules, and practiced common user cases by SQL queries.

• Connected the database with sample data to the PHP web application and ensured the functions of the system.

Details

Lucidchart: System Design for Yeshili Refective Material

• Conducted weekly meeting and understood client's business reqiurments of the information system.

• Designed the ER Diagram, defined the work flow through business process, implemented the system.

Details

Arena: Simulation Model for a local Hotpot Restaurant and PAN Analysis

• Built a workflow model for a hotpot restaurant by various modules, such as Assign, Decide, Process, Record, and Hold.

• Conducted PAN Analysis to give solutions for varied scenarios by assigning the resources differently.

Details

Arena: Bicycle-sharing System Design and Analysis

• Showed the workflow of bicycle-sharing system between stations by different types of modules

• Tested until figured out the optimzied number of bicycle holders and bicycles at the selected stations.

Details

Contact


wanyun0403@gmail.com

EMAIL ME


linkedIn GitHub Tableau Public

© 2020 Wanyun Yang. All rights reserved.