I'm Wanyun,
a data scientist.
University of MarylandMaster of Information Systems * Outstanding Graduate Project Award in DC Data Challenge 2020 * Smith Master Student Association Ambassador * Terrapin Scholar Award |
📍 College Park, MD 📆 Dec 2020 |
University of MarylandBachelor of Statistics & Applied Economics * Scholarship for Excellent Academic Performance |
📍 College Park, MD 📆 May 2019 |
Udacitynano-degree: Machine Learning Engineer * In progress |
Analyst, Advanced Analytics
...
📍 Boston, MA
📆 August 2021 - Present
Research Assistant in Text Mining & Teaching Assistant
• Scraped 3000+ announcements from 98 websites in 15 categories using Python BeautifulSoup, conducted exploratory analysis on COVID-19 response trends using word embeddings with R Glove & Word2Vec, topic modeling with Seeded LDA, and sentiment analysis with dictionaries like BING, AFINN & NRC lexicons & visualized the insights with word clouds, scatter & bar charts.
• Created interactive presentations for the Data Mining class using R Xaringan & Flipbookr packages, presented to 200+ students.
• Graded 40+ assignments for Pricing and Revenue Management MBA class, provided detailed feedback with recommendations.
📍 College Park, MD
📆 November 2020 - March 2021
Machine Learning Researcher
• Processed 200k records of pharmacy sales data for the past 2 months, imputed missing data, treated outliers & performed EDA.
• Predicted next week's drug sales for 5 pharmacies, trained & evaluated Support Vector Regressor, Random Forest, & Gradient Boosting models using Sklearn in Python, achieved lowest MAE of 19 with Gradient Boosting model.
📍 College Park, MD
📆 January 2020 - May 2020
Business Analyst Intern
• Boosted the speed of clients’ data extraction by 1.5X times, with optimized database schema using SQL queries.
• Researched the financial markets & prepared a market analysis report that was published in the State Journal of Banking & Finance.
📍 Xi'an, China
📆 December 2017 – January 2018
Python | R | SQL | Tableau | Power BI | Looker | Git | AWS | Spark | Hadoop | ProjectLibre | Arena | HTML | CSS | JavaScript
Tableau Desktop Specialist, AWS Cloud Practitioner(In progress)
• Cleansed, & preprocessed 5269 Amazon beauty product reviews, stemmed & tokenized keywords & built TF-IDF matrix with NLTK.
• Predicted customer sentiments using Random Forest & LSTM from Sklearn & PyTorch packages in Python & achieved 94% & 90% accuracies respectively.
Details• Analyzed 20+ Airbnb property features correlated with high-profit conversion rates, provided actionable insights & recommendations.
• Collected geolocation, demographic & public infrastructure resource data to capture potential community features.
• Applied Logistics & Lasso regressions, Tree Ensemble models using Caret & Tidymodels libraries in R, tuned hyper-parameters and achieved highest specificity of 97.55% with a Random Forest model at a 0.69 threshold on Receiver Operating Characteristic curve.
• Generated the Variable Importance Plot & quantified the impact of each significant feature on the high booking rate.
Details• Analyzed unusual historical changes of air quality data by combining the sensor records, geolocation, and demographic data.
• Decomposed time-series data, built predictive model, and visualized it by using Fpp and Predict packages.
Details• Analyzed and interpreted the relationship between different factors and trip durations.
• Visualized and captured the trends of peak hours on weekend and weekdays.
• Mapped popular locations and analyzed the deeper reason behind it.
• Scraped pokemons' information from official website by using Beautiful Soup package.
• Analyzed the power of pokemons by different types and locations in the pokemon XY game.
Details• Collected information from 17 documents, designed the ER Diagram with 5 tables using Lucidchart, determined functional dependencies, performed normalization, defined the business rules, and practiced common user cases by SQL queries.
• Connected the database with sample data to the PHP web application and ensured the functions of the system.
Details• Conducted weekly meeting and understood client's business reqiurments of the information system.
• Designed the ER Diagram, defined the work flow through business process, implemented the system.
Details• Built a workflow model for a hotpot restaurant by various modules, such as Assign, Decide, Process, Record, and Hold.
• Conducted PAN Analysis to give solutions for varied scenarios by assigning the resources differently.
Details• Showed the workflow of bicycle-sharing system between stations by different types of modules
• Tested until figured out the optimzied number of bicycle holders and bicycles at the selected stations.
Details