Projects finished & in progress

ICM Weather Project (Ongoing):

Data from weather stations is used for output files compression and prediction of variables such as temperature at a future point in time.

Datarevenue Feature Engineering Database (Ongoing):

Developed a database of feature engineering methods for various data types. Created functions for various encoding methods, text cleaning & processing methods and more. Around 50 functions were prepared. To be published soon.


Kaggle Profile - Wrosinski

Achieved Competitions Master tier.

Selected Achievements:

  • Quora Question Pairs - NLP, 14th out of 3307 (top 1%), Gold medal
  • Intel & MobileODT Cervical Cancer Screening - Object Detection, Finished 18th out of 848 (top 3%), Silver medal
  • Nature Conservancy Fisheries Monitoring - Object Detection, 30th out of 2293 (top 2%), Silver medal
  • Allstate Claims Severity - Feature Engineering & Ensembling, 99th out of 3055 (top 4%), Silver medal

Computer Vision:

  • Intel & MobileODT Cervical Cancer Screening:

    Finished 18th out of 848 (top 3%).

    Goal of the competition was to identify woman’s cervix type based on images in order to help choose proper treatment. State-of-the-art detectors were used to detect ROI in cervical screening images, afterwards the interesting regions were chosen and fed to pretrained on ImageNet classification models. Data augmentation and oversampling techniques were additionally used as a remedy to class imbalance and small dataset size.

  • Nature Conservancy Fisheries Monitoring:

    Top 2% finish (30th out of 2293 competitors) in a team of two.

    Used state-of-the-art CNN detection model (YOLO v2) to detect fish on images acquired from cameras placed on boats. Afterwards pretrained CNN models (VGG, Inception, ResNet’s) were used and fine-tuned on cropped fish images to classify their species.

  • Data Science Bowl 2017:

    Top 11% finish (209th out of 1972 finish).

    Aim of the project was to detect and classify whether lesions present in patient lungs were malignant or not. Data consisted of 3D CT scans. 2D & 3D Image segmentation CNN models were created based on U-Net and 3D U-Net architectures. Possible lesions locations were predicted, which afterwards were fed into 2D & 3D classification Convolutional models, which task was to assess whether the lesions may be malignant. Github


  • Quora Question Pairs:

    Finished 14th out of 3307 (top 1%), work was done by a team of three.

    Similarity detection methods consisting of state-of-the-art RNN models (Decomposable Attention, Siamese networks) in conjunction with various feature engineering and feature extraction methods were used to assess whether a pair of questions is a duplicate or not. Ensembling methods were further used to combine classifiers into a meta-model. Github

Structured Data:

  • Allstate Claims Severity

    Finished in the Top 4% (99th out of 3055 competitors).

    Developed a model to predict claims severity in a competition hosted by Allstate, which was only 0,32% less accurate than the winning one. Used ensemble of various XGBoost, LightGBM and DNN models, which were trained on differently processed data in order to uncover different relations. Models were stacked afterwards, 2nd and 3rd level meta-models (DNN, VW, FM) were trained on lower levels models predictions, making use of each 1st level models different strengths.

  • Santander Product Recommendation

    Finished in Top 7%.

    Created a model to recommend products for Santander, based on users previous purchases products for current period were predicted. LightGBM was used as a main algorithm for predictions. Various feature engineering and data cleaning methods were employed to help discover important relationships that were at first hidden in raw data.

  • Instacart Market Basket Analysis

    Finished in Top 8%.

    Created a model to predict products chosen by customers for their next order. Various methods of feature engineering were used to create meaningful features out of raw data and afterwards LightGBM model was tuned and used to output final predictions.