Problem
We have an interesting talk with Money Lover. It’s an money/finance management for people. Using this app, an user will note a message each time they spend money. A financial management app like Money Lover wants to know what kind of service/product category that users are spending money on.
Fortunately, we have a nice classification method for short message. Let’s see how it works.
Solution
1. Prepare catalog tree
The catalog tree Money Lover shared us is great. It’s perfect for financial management purpose, we think.
In this post, we introduce classification method and demo for the parent categories and Vietnamese messages.
2. Mining linguistic data for categories
By doing some data mining technique, we can extract text/keywords related to each categories, for example:
“auto_and_transport”: uber, grab, taxi, vnairlines, xăng, gửi xe, …
“food_and_dining”: metro, bigc, kfc, pizza, thịt, cá, trứng, …
“entertainment”: cgv, platinum, xem phim, game, quẩy, …
…
One of our technique to extract text/keywords for a category was introduced at this blog
3. Using mined data for Naive Bayes classification
We love Naive Bayes technique, it’s simple and extremely effective for short messages – where the keywords decide meaning of of message. Further detail about this method we had described in this post.
Here come the result
We have a demo working here, just input the text in and see the result. Remember about the context of messages: they are financial short note.
Message: “an trua” – (ăn trưa – lunch)
{
"labels": [
"food_and_dining"
]
}
Future works
Current input of this classification is linguistic data from social channel (because financial message’s context is similar to normal text context). One important data channel is not used: real users feedback on classified results.
In the future, we need to listen feedback of users who saw the classification result and rechecked it, then add that feedback info as learning data to make the classifier getting better day by day.