云题海 - 专业文章范例文档资料分享平台

当前位置:首页 > SPSS TwoStep Cluster-A First Evaluation

SPSS TwoStep Cluster-A First Evaluation

  • 62 次阅读
  • 3 次下载
  • 2026/4/25 7:31:31

SPSSTWOSTEPCLUSTER–AFIRSTEVALUATION?

JohannBacher?,KnutWenzig?,MelanieVogler§

Universit¨atErlangen-N¨urnberg

SPSS11.5andlaterreleasesofferatwostepclusteringmethod.Accordingtotheauthors’

knowledgetheprocedurehasnotbeenusedinthesocialsciencesuntilnow.Thissituationissurprising:Thewidelyusedclusteringalgorithms,k-meansclusteringandagglomerativehierarchicaltechniques,sufferfromwellknownproblems,whereasSPSSTwoStepclusteringpromisestosolveatleastsomeoftheseproblems.Inparticular,mixedtypeattributescanbehandledandthenumberofclustersisautomaticallydetermined.Thesepropertiesarepromising.Therefore,SPSSTwoStepclusteringisevaluatedinthispaperbyasimulationstudy.

Summarizingtheresultsofthesimulations,SPSSTwoStepperformswellifallvariablesarecontinuous.Theresultsarelesssatisfactory,ifthevariablesareofmixedtype.Onereasonforthisunsatisfactory?ndingisthefactthatdifferencesincategoricalvariablesaregivenahigherweightthandifferencesincontinuousvariables.Differentcombinationsofthecategor-icalvariablescandominatetheresults.Inaddition,SPSSTwoStepclusteringisnotabletodetectcorrectlymodelswithnoclustersolutions.Latentclassmodelsshowabetterperfor-mance.Theyareabletodetectmodelswithnounderlyingclusterstructure,theyresultmorefrequentlyincorrectdecisionsandinlessunbiasedestimators.

Keywords:

SPSSTwoStepclustering,mixedtypeattributes,modelbasedclustering,latentclassmodels1INTRODUCTION

SPSS11.5andlaterreleasesofferatwostepclusteringmethod(SPSS2001,2004).Accordingtotheauthors’knowledgetheprocedurehasnotbeenusedinthesocialsciencesuntilnow.Thissituationissurprising:Thewidelyusedclusteringalgorithms,k-meansclusteringandagglomerativehierarchicaltechniques,sufferfromwellknownproblems(forinstance,Bacher2000:223;Everittetal.2001:94-96;Huang1998:288),whereasSPSSTwoStepclusteringpromisestosolveatleastsomeoftheseproblems.Inparticular,mixedtypeattributescanbehandledandthenumberofclustersisautomaticallydetermined.Thesepropertiesarepromising.

?AUTHORS’NOTE:ThestudywassupportedbytheStaedtlerStiftungN¨urnberg(Project:WasleistenClus-”

teranalyseprogramme?EinsystematischerVergleichvonProgrammenzurClusteranalyse“).WewouldliketothankSPSSInc.(TechnicalSupport),JeroenVermuntandDavidWishartforinvaluablecommentsonanearlierdraftofthepaperandBettinaLampmann-EndeforherhelpwiththeEnglishversion.?bacher@wiso.uni-erlangen.de?knut@wenzig.de§vogler.m@gmx.de

1

2SPSSTWOSTEPCLUSTERINGTherefore,SPSSTwoStepclusteringwillbeevaluatedinthispaper.Thefollowingquestionswillbeanalyzed:

1.Howistheproblemofcommensurability(differentscaleunits,differentmeasurementlevels)solved?2.Whichassumptionsaremadeformodelswithmixedtypeattributes?

3.HowwelldoesSPSSTwoStep–especiallytheautomaticdetectionofthenumberofclusters–performinthecaseofcontinuousvariables?4.HowwelldoesSPSSTwoStep–especiallytheautomaticdetectionofthenumberofclusters–performinthecaseofvariableswithdifferentmeasurementlevels(mixedtypeattributes)?ThemodelofSPSSTwoStepclusteringwillbedescribedinthenextsection.Theevaluationwillbedoneinsection3.Section4

2SPSSTWOSTEPCLUSTERINGwhere

??

ξi=?ni

??

1

?i2j+σ?2?ijl)?ijllog(π∑2log(σj)?∑∑πj=1j=1l=1

p

q

mj

pq

mj

??

(2)

??

(3)(4)

1

?2?sjl)?sjllog(π?s2j+σ∑2log(σj)?∑∑πj=1j=1l=1

????

qmjp

1

?2???i,s??jl)???i,s??jllog(π???2i,s??j+σξ??i,s??=?n??i,s??∑log(σj)?∑∑πj=12j=1l=1

ξs=?ns

ξvcanbeinterpretedasakindofdispersion(variance)withinclusterv(v=i,s,??i,s??).ξv

12+σ?v?2consistsoftwoparts.The?rstpart?nv∑2log(σjj)measuresthedispersionofthecon-2wouldbeused,d(i,s)wouldbeexactlythe?vtinuousvariablesxjwithinclusterv.Ifonlyσj

?2decreaseinthelog-likelihoodfunctionaftermergingclusteriands.Thetermσjisaddedto

mqj2=0.Theentropy?n?vjllog(π?v?vjl)isusedavoidthedegeneratingsituationforσv∑j=1∑l=1πj

inthesecondpartasameasureofdispersionforthecategoricalvariables.

Similartoagglomerativehierarchicalclustering,thoseclusterswiththesmallestdistanced(i,s)aremergedineachstep.Thelog-likelihoodfunctionforthestepwithkclustersiscom-putedas

lk=

v=1

∑ξv.

k

(5)

Thefunctionlkisnottheexactlog-likelihoodfunction(seeabove).Thefunctioncanbeinterpretedasdispersionwithinclusters.Ifonlycategoricalvariablesareused,lkistheentropywithinkclusters.

Numberofclusters.Thenumberofclusterscanbeautomaticallydetermined.Atwophaseestimatorisused.Akaike’sInformationCriterion

AICk=?2lk+2rk

whererkisthenumberofindependentparametersorBayesianInformationCriterion

BICk=?2lk+rklogn

(7)(6)

iscomputedinthe?rstphase.BICkorAICkresultinagoodinitialestimateofthemaximumnumberofclusters(Chiuetal.2001:266).ThemaximumnumberofclustersissetequaltonumberofclusterswheretheratioBICk/BIC1issmallerthanc1(currentlyc1=0.04)1forthe?rsttime(personalinformationofSPSSTechnicalSupport).Intable1thisisthecaseforelevenclusters.

ThesecondphaseusestheratiochangeR(k)indistanceforkclusters,de?nedas

R(k)=dk?1/dk,

(8)

1ThevalueisbasedonsimulationstudiesoftheauthorsofSPSSTwoStepClustering.(personalinformationofSPSSTechnicalSupport,2004-05-24)

3

3EVALUATIONwheredk?1isthedistanceifkclustersaremergedtok?1clusters.Thedistancedkisde?nedsimilarly.2Thenumberofclustersisobtainedforthesolutionwhereabigjumpoftheratiochangeoccurs.3

Theratiochangeiscomputedas

R(k1)/R(k2)(11)forthetwolargestvaluesofR(k)(k=1,2,...,kmax;kmaxobtainedfromthe?rststep).Ifthe

ratiochangeislargerthanthethresholdvaluec2(currentlyc2=1.154)thenumberofclustersissetequaltok1,otherwisethenumberofclustersissetequaltothesolutionwithmax(k1,k2).Intable1,thetwolargestvaluesofR(k)arereportedforthreeclusters(R(3)=2.129;largestvalue)andforeightclusters(R(8)=1.952).Theratiois1.091andsmallerthanthethresholdvalueof1.15.Hencethemaximumof3resp.8isselectedasthebestsolution.

Clustermembershipassignment.Eachobjectisassigneddeterministicallytotheclosestclus-teraccordingtothedistancemeasureusedto?ndtheclusters.Thedeterministicassignmentmayresultinbiasedestimatesoftheclusterpro?lesiftheclustersoverlap(Bacher1996:311–314,Bacher2000).

Modi?cation.Theprocedureallowstode?neanoutliertreatment.Theusermustspecifyavalueforthefractionofnoise,e.g.5(=5%).Aleaf(pre-cluster)isconsideredasapotentialoutlierclusterifthenumberofcasesislessthanthede?nedfractionofthemaximumclustersize.Outliersareignoredinthesecondstep.

Output.Comparedtok-meansalgorithm(QUICKCLUSTER)oragglomerativehierarchicaltechniques(CLUSTER),SPSShasimprovedtheoutputsigni?cantly.Anadditionalmodulallowstostatisticallytestthein?uenceofvariablesontheclassi?cationandtocomputecon?dencelevels.

3EVALUATION3.1Commensurability

Clusteringtechniques(k-means-clustering,hierarchicaltechniquesetc.)requirecommensu-rablevariables(forinstance,Fox1982).Thisimpliesintervalorratioscaledvariableswithequalscaleunits.Inthecaseofdifferentscaleunits,thevariablesareusuallystandardizedbytherange(normalizedtotherange[0,1],rangeweighted)orz-transformedtohavezeromeanandunitstandarddeviation(autoscaling,standardscoring,standarddeviationweights).Ifthevariableshavedifferentmeasurementlevels,eitherageneraldistancemeasure(likeGower’sgeneralsimilaritymeasure;Gower1971)maybeusedorthenominal(andordinal)variablesmaybetransformedtodummiesandtreatedasquantitative5(Benderetal.2001;Wishart2003).

2Thedistancesdkcanbecomputedfromtheoutputinthefollowingway:

dklv

=lk?1?lk

=(rvlogn?BICv)/2

or

lv=(2rv?AICv)/2for

v=k,k?1

(9)(10)

However,usingBICorAICresultsindifferentsolutions.

3Theexactdecisionrulesaredescribedvaguelyintherelevantliteratureandthesoftwaredocumentation(SPSS2001;Chiuetal.2001).Therefore,wereporttheexactdecisionrulebasedonpersonalinformationofSPSSTechnicalSupport.Adocumentationintheoutput,like“solutionxwasselectedbecause...”,wouldbehelpfullfortheuser.

4Likec1,c2isbasedonsimulationstudiesoftheauthorsofSPSSTwoStepClustering.(personalinformationofSPSSTechnicalSupport,2004-05-24)

5Theterm“quantitative”willbeusedforintervalorratioscaledvariables.

4

搜索更多关于: SPSS TwoStep Cluster-A First E 的文档
  • 收藏
  • 违规举报
  • 版权认领
下载文档10.00 元 加入VIP免费下载
推荐下载
本文作者:...

共分享92篇相关文档

文档简介:

SPSSTWOSTEPCLUSTER–AFIRSTEVALUATION?JohannBacher?,KnutWenzig?,MelanieVogler§Universit¨atErlangen-N¨urnbergSPSS11.5andlaterreleasesofferatwostepclusteringmethod.Accordingtotheauthors’knowledgetheprocedurehasnotbeenusedinthesocialsciencesuntilnow.Thissituationissurprising:Thewidelyusedclusteringalgorithms,k-meansclusteringandagglomerativehierarchicaltechniques,sufferfr

× 游客快捷下载通道(下载后可以自由复制和排版)
单篇付费下载
限时特价:10 元/份 原价:20元
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能“只有目录或者内容不全”等情况,请下载之前注意辨别,如果您已付费且无法下载或内容有问题,请联系我们协助你处理。
微信:fanwen365 QQ:370150219
Copyright © 云题海 All Rights Reserved. 苏ICP备16052595号-3 网站地图 客服QQ:370150219 邮箱:370150219@qq.com