Clustering results don't converge

Comments

3 comments

  • Joel Stewart

    Fritz, thank you for the question. As documented in our Clustering page, Datameer uses the standard k-means algorithm, not the k-means++ algorithm. As you highlighted, the standard k-means algorithm does have limitations when selecting initial cluster centers -- this could be the root cause of this particular data sample's inconsistency with cluster results. 

    It may be possible to create a k-means++ style algorithm using our SDK. Alternatively, our Professional Services team could scope building a solution using the k-means++ updated algorithm. 

    0
    Comment actions Permalink
  • Fritz Schinkel

    Hi Joel, thanks for the answer. Sorry to hear, that Datameer does not use k-means++. Since with Mahout and SparkML there are two k-means-implementation on Hadoop doing the improved initiatization, I thought Datameer would leverage one of them and does the same. Do you see a chance to add a k-means option in one of the next releases? We could connect SparkML k-means++ via SDK like we did with other machine learning algorithms, but it is hard to argue Datameer value-add then.

    0
    Comment actions Permalink
  • Joel Stewart

     Thank you for the update Fritz. I've created an enhancement request, which is known internally as DAP-34471 on your behalf. 

    Please note that enhancement requests are evaluated and considered by the Datameer Product Management team. Not all enhancements will be implemented into a future Datameer release. If an enhancement request is selected by Datameer Product Management to be implemented in a future release, an estimated release date or version may not be available. 

    0
    Comment actions Permalink

Please sign in to leave a comment.