4 2012 [Doi 10.1016_j.elerap.2012.02.004] Keunho Choi; Donghee Yoo; Gunwoo Kim; Yongmoo Suh -- A Hybrid Online-product Recommendation System- Combining Implicit Rating-based Collaborative Filtering and Sequential Patte

A hybrid online-product recommendation system: Combining implicit rating-based collaborative filtering and sequential pattern analysis Keunho Choi a , Donghee Yoo b , Gunwoo Kim c,⇑ , Yongmoo Suh a a Business School, Korea University, Anam-Dong 5-Ga, Seongbuk-Gu, Seoul, Republic of Korea b Department of Electronics Engineering & Information Science, Korea Military Academy, Gongneung 2-Dong, Nowon-Gu, Seoul, Republic of Korea c College of Business & Economics, Hanbat National University, San 16-1
of 9
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Similar Documents
  A hybrid online-product recommendation system: Combining implicitrating-based collaborative filtering and sequential pattern analysis Keunho Choi a , Donghee Yoo b , Gunwoo Kim c, ⇑ , Yongmoo Suh a a Business School, Korea University, Anam-Dong 5-Ga, Seongbuk-Gu, Seoul, Republic of Korea b Department of Electronics Engineering & Information Science, Korea Military Academy, Gongneung 2-Dong, Nowon-Gu, Seoul, Republic of Korea c College of Business & Economics, Hanbat National University, San 16-1, Duckmyoung-Dong, Yuseong-Gu, Daejeon, Republic of Korea a r t i c l e i n f o  Article history: Received 19 April 2011Received in revised form 16 February 2012Accepted 16 February 2012Available online 24 February 2012 Keywords: Collaborative filteringSequential pattern analysisImplicit ratingRecommendationHybrid approach a b s t r a c t Many online shopping malls in which explicit rating information is not available still have difficulty inproviding recommendation services using  collaborative filtering   (CF) techniques for their users. Applyingtemporal purchase patterns derived from  sequential pattern analysis  (SPA) for recommendation servicesalso often makes users unhappy with the inaccurate and biased results obtained by not considering indi-vidual preferences. The objective of this research is twofold. One is to derive implicit ratings so that CFcan be applied to online transaction data even when no explicit rating information is available, andthe other is to integrate CF and SPA for improving recommendation quality. Based on the results of sev-eral experiments that we conducted to compare the performance between ours and others, we contendthat implicit rating can successfully replace explicit rating in CF and that the hybrid approach of CF andSPA is better than the individual ones.   2012 Elsevier B.V. All rights reserved. 1. Introduction In an age of information overload, the importance of personal-ized recommendation systems for online products and services israpidlygrowing.Suchsystemsallowbuyerstofindwhattheywantwithoutwastingtheirtimeandalsoenablesellerstoprovidebuyerswith the items they are likely to purchase, thereby furnishing ben-efits to both parties. As a result of this growing importance, funda-mentalknowledgeandtechniquesfordevelopingrecommendationsystems have been studied, includingcontent-based filtering(CBF)(Belkin and Croft 1992, Lang 1995, Mooney and Roy 1999, Pazzaniand Billsus 1997), collaborative filtering (CF) ( Joaquin and Naohiro1999, Nakamura and Abe 1998, Si and Jin 2003, Yu et al. 2004,2002), association rule or sequential pattern analysis (Aggarwalet al. 2002, Huang and Huang 2009, Wang et al. 2008), and hybridapproaches (Balabanovic and Shoham 1998, Liu et al. 2010, Salterand Antonopoulos 2006, Wei et al. 2008).A number of studies have attempted to resolve several typicalproblemsof eachrecommendationtechniquesuchas thenewuser(or cold start) problem(Kimet al. 2010, Park and Chang 2009), thenew item (or the first rater) problem (Balabanovic and Shoham1998, Lee et al. 2008), and the sparsity problem ( Jeong et al.2009, Kim et al. 2010, Lee and Olafsson 2009, Park and Chang2009). However, there are still issues for how online shoppingmallscanmakebetterrecommendationsfortheirusers. Thispaperproposes a novel approach to improving recommendation qualityand value, especially in the environment of e-commerce, focusingon the following two issues.Onlineshoppingmallsrarelyofferexplicitratinginformationbyusersfor itemsrequiredbytheCFtechnique( Jeonget al. 2009; Leeet al. 2010, 2008; Su et al. 2010). The technique has been widelyused and has proved to be useful in practice, but it has a criticallimitation – it cannot be adopted by recommendation systemsfor online shopping malls since explicit rating information onitems is not available. So, our first research issue is:   How can we derive implicit rating information from e-com-merce transaction data that can be used for the CF technique,instead of explicit rating information?In order for the CF technique to be used by more online shop-ping malls, it is necessary to derive implicit rating informationfromtransactiondata,whichcanbeusedasaproxyforexplicitrat-ing information. Based on the following two ideas, we defined afunction which computes implicit ratings from the transactiondataofuserswhopurchasedthesameitemsmanytimes:(1)auserwho buys an itemmore than once implies the user likes it; and (2)a user who buys an item more frequently than another user im-plies that the former likes it more than the latter.Many shopping malls have adopted sequential pattern analysis(SPA) to find temporal associations among items, but they may 1567-4223/$ - see front matter   2012 Elsevier B.V. All rights reserved. ⇑ Corresponding author. Tel.: +82 42 821 1290; fax: +82 42 821 1288. E-mail address: (G. Kim).Electronic Commerce Research and Applications 11 (2012) 309–317 Contents lists available at SciVerse ScienceDirect Electronic Commerce Research and Applications journal homepage:  suffer from a higher probability of inaccurate and biased recom-mendations for items because they consider just purchasing infor-mation rather than rating information. Since purchasing an itemdoes not always mean preferring it, the results of SPA may be use-ful but they will not always be useful for recommendation. Thisproblem can be mitigated by integrating SPA with CF, which usesrating information. In contrast, CF uses only rating information of items without reflecting the changes in user preference over time.This limitation of CF can be reduced by integrating CF with SPA,which returns changes in user preferences over time in the formof sequential patterns. So, our second research issue is:   How can we integrate CF and SPA to get better recommenda-tions than either of the two alone?In order to make CF and SPA supplement each other, we firstcalculated the predicted preference of a target user on an item torecommend from CF ( CFPP  ) and from SPA ( SPAPP  ). Then, we com-puted a linear summation of them by giving various weights toeach to get a final predicted preference ( FPP  ) of the target useron the item. Finally, the top  n  items with the highest  FPP   arerecommended.We implemented a Hybrid Online-Product rEcommendation(HOPE) system, which integrates CF-based recommendation usingimplicit ratings and SPA-based recommendations. Experiments tocompare the performance of the HOPE system with those of otherrecommendationsystemswereconductedusingadatasetofama- jor online shopping mall in Korea.The rest of this paper is organized as follows. Section 2 reviewsprevious works regarding recommendation systems. Section 3 de-scribes the overall framework for realizing our approach and pro-vides a detailed description of each step of the framework.Section 4 illustrates our approach with an example. In Section 5, we explain how much difference our approach makes based onthe results from four experiments and describe the implicationsof each experiment. The last section contains concluding remarks,including a summary, implications, and a limitation of thisresearch. 2. Previous works The techniques used in most of recent recommendation sys-tems can be generally categorized into four types: content-basedfiltering (CBF); collaborative filtering (CF); rule-based approaches;and hybrid approaches.CBF recommendation systems typically: (1) construct an  item profile  by extracting a set of features from each item in the itemset; (2) build a content-based  user profile  from a set of features of the items that each user purchased; (3) calculate the similarity be-tween the user profiles and the item profiles using a specific sim-ilarity function; and (4) recommend top  n  items with highsimilarityscores. That is, theyrecommenditemsbasedonthesim-ilarity between items to recommend and items already purchased.Initially, these systems were used to recommend documents suchasnetnews(Lang1995),webpages(PazzaniandBillsus1997),and books (Mooney and Roy 1999). Both the user profiles and the itemprofiles have an associated weight given to a set of keywords ex-tracted from documents using information retrieval techniques(Baeza-Yates and Ribeiro-Neto 1999, Salton 1988) or informationfiltering techniques (Belkin and Croft 1992). Since both profilesare represented by weight vectors, a similarity score is computedusing a heuristic function such as cosine similarity function orPearson correlation (Balabanovic and Shoham 1998, Lang 1995).Other techniques, such as classification models built from a statis-tical approach (Mooney and Roy 1999) or a data mining approach(Pazzani and Billsus 1997), have been used to classify whether adocument item is relevant to a user or not. CBF systems, however,have several limitations: (1) it is not easy to obtain a sufficientnumber of features to build profiles ( insufficient features problem )(Shardanand and Maes 1995); (2) recommended items are limitedto those that are similar to the items that a target user purchasedbefore ( over-specialization problem ) (Adomavicius and Tuzhilin2005); and (3) new users who have not purchased items or usersunusual in their preference cannot get a proper recommendation( new or unusual user problem ) (Adomavicius and Tuzhilin 2005,Billsus et al. 2002).CF-based recommendation systems typically: (1) build a userprofile from rating information of each user on items; (2) identifylike-minded users who rate items similarly to a target user using asimilarity function such as cosine similarity, Pearson correlationcoefficient, or distance-based similarity; and (3) recommend top n  items that the like-minded users preferred after their ratingsare predicted as an average, weighted sum or adjusted weightedsum of ratings given on items by the identified like-minded users.That is, they recommend items based on the similarity betweenusers. These methods of rating prediction are called  memory-based ( Joaquin and Naohiro 1999, Nakamura and Abe 1998, Si and Jin2003, Yu et al. 2004, 2002). Another method of rating prediction,called  model-based , is one in which a model such as a probabilisticmodel or a machine learning model is built from a large collectionof ratings and is used to predict ratings of items (Billsus andPazzani 1998, Cheung et al. 2003, Getoor and Sahami 1999,Goldberg et al. 2001, Hofmann 2003, 2004; Kumar et al. 2001,Marlin 2003, Pavlov and Pennock 2002, Pennock et al. 1999, Shaniet al. 2002). Many CF-based recommendation systems have beendeveloped, including Tapestry for recommending news articles(Goldberg et al. 1992), GroupLens for net news (Resnick et al. 1994), and Ringo for music (Shardanand and Maes 1995). CF recommendation systems, however, also have some limitations:(1) it is difficult to recommend items for users who have neverrated items before ( new user problem ) (Kim et al. 2010, Park andChang 2009); (2) it is difficult to recommend items which havenever been rated before ( new item problem ) (Balabanovic andShoham1998, Leeet al. 2008); and(3)theymakepoorrecommen-dations when rating information is insufficient ( sparsity problem )( Jeong et al. 2009, Kim et al. 2010, Lee and Olafsson 2009, Parkand Chang 2009).Another simple but popular way of recommending items to auser is the rule-based approach. Rules are derived from a largetransaction database collected over time, using data mining tech-niques. It could be either an  association rule  among items that arepurchased together (Aggarwal et al. 2002) or a  sequential pattern among items that are purchased in sequence over time (Huangand Huang 2009, Wang et al. 2008). The rule-based approach torecommending items, however, has a limitation in that it is diffi-cult to recommend items that do not appear in association rulesorsequentialpatterns.Aggarwaletal.(2002)proposedatechniquefor discovering localized association rules that are helpful for tar-get marketing. They first clustered market basket data using boththemushroomdataset andadult dataset inthe UCI machinelearn-ing repository ( and then derived associa-tion rules from each cluster. Huang and Huang (2009) proposed asequential pattern-based recommendation system that predictsthe customer’s time-variant purchase behavior in a supermarket.They first clustered customers and derived sequential patternsamong food items for each cluster in each time period. By takinginto account the dynamic nature of a customer’s purchase se-quences, they improved the recommendation quality.Hybrid recommendation systems have been developed to over-come, or at least to mitigate, the limitations of CBF, CF, and rule-based recommendation systems (Balabanovic and Shoham 1998,Liu et al. 2009, 2010; Salter and Antonopoulos 2006, Wei et al. 310  K. Choi et al./Electronic Commerce Research and Applications 11 (2012) 309–317   2008). The Fab system (Balabanovic and Shoham 1998) combines the CF and the CBF techniques to eliminate the  insufficient featuresand over-specialization problems  of CBF technique and  new item problem ofCFtechnique.Inthissystem,content-baseduserprofilesaremaintainedto determinesimilar users for collaborative recom-mendation, anditemsarerecommendedtoatarget userwhentwoconditions are simultaneously satisfied: (1) each item must have ahigh score against the target user’s profile; and (2) each itemshould be highly rated by users whose profiles are similar to thatof the target user.Liu et al. (2009) selectedthe top  k  neighbors fromthe cluster towhich a target user belongs using binary choice (purchased/not-purchased) analysis of shopping basket data and derived the pre-dictionscoresofitems(notyetpurchasedbythetargetuser)basedon the frequency count of them by scanning the purchase data of the  k  neighbors. Meanwhile, we selected neighbors from entireuser space based on the newly derived implicit rating informationofusersandwederivedthepredictionscoresofitems(notyetpur-chased or already purchased by the target user) based on the ad- justed weighted sum of ratings given on them by the  k neighbors. In addition, they divided the entire time period intothree periods and clustered transactions of users in each time per-iod,andthen,theyderivedsequentialpatternsrepresentedbyase-quence of transaction clusters over the three periods, while wederived sequential patterns represented by a sequence of itemsover the entire period. That is, they tried to find cluster sequence,while we tried to find item sequence. As such, our approach is be-lieved to be better than theirs in that we can make a more person-alized recommendation.Depending on the techniques to be applied to recommendationsystem, the types of information required for making recommen-dations arefairlydifferent fromone another. CBFrecommendationsystemshave used content information of items  to builduser profileand find similar items to the items that a target user purchasedbased on the content similarity (Albadvi and Shahbazi 2009, Leeand Kwon 2008; Liang et al. 2008, Park and Chang 2009, Ricciand Nguyen 2007, Salter and Antonopoulos 2006). On the otherhand, CF recommendation systems have used  rating informationof users on items  to represent the user’s preference on correspond-ingitems,andpredictratingsofatargetuseronitemsbasedontheuser similarity in ratings (Adomavicius and Kwon 2007, Goldberget al. 2001, Jeong et al. 2009, Kwon et al. 2009, Russell and Yoon2008).Rule-basedapproachhasused  purchase behavior information of users to derive meaningful association rules and sequential pat-terns and make recommendations based on them (Cho et al. 2002,Huang and Huang 2009, Jiang et al. 2009, Liu et al. 2009, Shih andLiu 2008). 3. Hybrid Online Product rEcommendation (HOPE) system We developed a recommendation system, called HOPE, whichintegrates CF-based recommendation using implicit rating andSPA-based recommendation. This section presents the overviewof the system, followed by the detailed description of each stepof the framework.  3.1. System overview Fig. 1 shows an overall framework of our recommendation sys-tem, HOPE system, which consists of two main processes: CF pro-cess and SPA process. The CF process, depicted in the upper leftpart of the figure, is the same as the traditional CF process, exceptthat an implicit rating derived from transaction data of users isused instead of explicit rating. Thus, it calculates the similarity be-tween a target user and other users using the implicit rating andselects the top  k  users based on the similarity score as neighborsof a target user. Finally, the predicted preferences of a target useron items purchased by the top  k  neighbors ( CFPP  ) are calculatedbased on the ratings of the neighbors. The SPA process, depictedin the upper right part of  Fig. 1, derives sequential patterns fromtransactiondataof other users, andpredictedpreferencesonitems( SPAPP  ) are calculated by matching all subsequences of a targetuser’s purchase sequence data with each derived sequential pat-tern. Finally, the weighted sum of normalized  CFPP   and  SPAPP   iscalculated as a final predicted preference ( FPP  ) on each candidateitem to recommend, and then the top  n  items with the highest FPP   are recommended.  3.2. Collaborative filtering-based recommendation 3.2.1. Deriving implicit ratings of users on items It is usually difficult to obtain explicit rating information onitems. In order to use the CF technique in such circumstance, thispaper suggests a method of deriving implicit ratings of users onitems from transaction data as an alternative to explicit ratings.The absolute preference of user  u  on item  i ,  AP  ( u , i ), is computedfrom the following equation.  AP  ð u ; i Þ¼ ln Thenumberof transactionsof user u includingitem i Thenumberof transactionsof user u  þ 1   ð 1 Þ It is computed solely based on the purchase data of user  u . This va-lue, however, is farfromrepresentingtheexactpreferenceof user  u on item  i  because it only takes into account the frequency of pur-chase and because the frequency is quite different depending ontheitemprice,itemlifetime,andthelike.Forexample,sinceexpen-sive items or items with long lifespan, such as jewelry or electrichomeappliances, are usuallypurchasedinfrequently. So theprefer-ences of users for themcannot be higher than cheap items or thosewith a short lifespan, such as hand creams or tissues. Also, when auser  u  purchased item  i  four times out of ten transactions (i.e.,  AP  ( u , i ) is 1.4), we may think that he does not prefer item  i  if otherusers purchased the same item eight times out of ten transactions,while we may think that he prefers item  i  if other users purchasedthe same item only once. It is therefore necessary to define relativepreference so it is comparable among users. The relative preferenceof user  u  on item  i ,  RP  ( u , i ) is thus defined as in the followingequation. RP  ð u ; i Þ ¼  AP  ð u ; i Þ Max c  2 U  ð  AP  ð c  ; i ÞÞ ð 2 Þ where U  denoteseveryuserwhopurchaseditem i .Notethat RP  rep-resentsa user’s preference to some extent, while  AP   does not. InEq.(2), we used a maximization function in the denominator. The rea-son for using maximization is to make  RP  ( u , i ) range from 0.0 to 1.0(i.e., normalization).Finally, we multiplied  RP  ( u , i ) by 5 and rounded up so that im-plicitratingrangesfrom1to 5, as is mostlyusedincurrentrecom-mendation systems, which is explained by the following equation. Implicit rating ð u ; i Þ ¼  Round up  ð 5  RP  ð u ; i ÞÞ ð 3 Þ  3.2.2. Calculating similarity score based on implicit ratings Withtheimplicitratingsofusersonitems,similaritybetweenatarget user and every other user is calculated, as is done intraditional CF technique, using such similarity function as Pearsoncorrelation coefficient (Albadvi and Shahbazi 2009, Kwon et al.2009,Leeetal.2008,RussellandYoon2008,SalterandAntonopoulos2006), cosine similarity ( Jeong et al. 2009, Lee et al. 2008,Symeonidis et al. 2008) or distance measures (Adomavicius and K. Choi et al./Electronic Commerce Research and Applications 11 (2012) 309–317   311  Kwon2007,Kimetal.2009,2008;ParkandChang2009).ThePear-son correlation coefficient estimates the similarity based on the rating pattern betweentwousers.Cosinesimilaritytreatstwousersas two vectors in the  m -dimensional rating vector space, where  m denotes the set of all items rated by both users, and estimates thesimilarity by calculating the  cosine value of the angle  between thetwo vectors. Finally, distance measure estimates the similarity be-tweenatargetuserandother userbycalculatingthe absolute mag-nitude of the similarity  between two users in the  m -dimensionalrating vector space, so that distance-based similarity is definedas an inverse of the distance. The three similarity functions are de-fined in Eqs. (4)–(6) as follows: Pearson correlation coefficient ð a ; b Þ¼ P mi ¼ 1 ð R ai   R a Þð R b ; i   R b Þ  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP mi ¼ 1 ð R a ; i   R a Þ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP mi ¼ 1 ð R b ; i   R b Þ 2 q   ð 4 Þ Cosine similarity ð a ; b Þ ¼ P mi ¼ 1 ð R a ; i Þð R b ; i Þ  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP mi ¼ 1 ð R a ; i Þ 2 q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP mi ¼ 1 ð R b ; i Þ 2 q   ð 5 Þ Distance-based similarity ð a ; b Þ ¼  11 þ  ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffiP mi ¼ 1 ð R a ; i   R b ; i Þ 2 q   ð 6 Þ where R a ; i ; R b ; i ; R a , and R b  denote theratings of users  a  and b  onitem i , average of all  R a,i  and average of all  R b,i , respectively.Since the above three similarity functions estimate the similar-ity between two users from different perspectives, depending onthe similarity functions to be used, the set of neighbors whose rat-inginformationisusedtopredictthepreferenceofatargetuseroncandidate items to recommend could be different, and thus, so aretheitemsfinallyrecommended.Itmeansthatthechoiceofsimilar-ityfunctionshouldbemadeproperlybasedonthedatasetathand.Therefore, to find a similarity function that is more appropriate forour data set, we attempted to use all these three similarity func-tions and compared their accuracy. Fig. 2 shows different perspec-tives of the three similarity functions conceptually.  3.2.3. Finding neighbors Having calculated the similarity between a target user andevery other user using each similarity function, users are sortedby similarity in descending order and then the top  k  users are se-lected as neighbors of target user  a . We also changed the numberof neighbors from1 to 2to 3 to4 to 5to findthe appropriatenum-ber of like-minded neighbors.  3.2.4. Calculating the CF-based predicted preference (CFPP) The rating information of the top  k  neighbors is then used topredictCF-basedpredictedpreferenceofuser  a  onitem i ,  CFPP  ( a , i ),as is shown in Eq. (7). CFPP  ð a ; i Þ ¼  R a  þ  1 P kb ¼ 1 j sim ð a ; b Þj  P kb ¼ 1 sim ð a ; b Þð R b ; i   R b Þ ;  ð 7 Þ where  k  denotes the number of user  a ’s neighbors and  sim ( a , b ) de-notes the similarity between users  a  and  b , which is computedusing Pearson correlation coefficient, cosine similarity or distancemeasure.  3.3. Sequential pattern analysis-based recommendation 3.3.1. Deriving sequential patterns In order to calculating predicted preferences of items based onSPA method, sequence data of each user is generated firstly bysortingtransactiondataforthepersonaccordingtothetransactiondate. Sequence data is a series of item sets, ordered by theirpurchase time stamp. And then, sequential patterns are derivedfrom sequence data of users except a target user using the SPAmethod. Collaborative Filtering (CF) Calculate similarity score based on implicit ratingsFind neighborsCalculate CF-based predicted preference (CFPP) Sequential Pattern Analysis (SPA) Match subsequences of a target userwith derived sequential patternsDerive sequential patternsCalculate SPA-based predicted preference (SPAPP)Recommend itemsDerive implicit ratings of usersIntegrate CFPP and SPAPP Fig. 1.  Overall framework of HOPE system. Fig. 2.  Different perspectives of three similarity functions.312  K. Choi et al./Electronic Commerce Research and Applications 11 (2012) 309–317 
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!