Smart Execution, In Memory Jobs, Workbooks Size

2 followers
0
Avatar

Dear,

We are testing/measuring performance on a new Env. Datameer on Hadoop Cluster against Local Mode Datameer.

1) The Workbooks that are using Data Links are quite faster on The Hadoop Cluster which is understandable, but the In memory jobs are quite slower. Can You please advise for a setting(s) change in order to improve this?

2) Smart Execution is not used on the Hadoop Cluster, although Spark and Tez plugins are enabled on Datameer. Also all the jobs are executed as "Standard MR job". How can We enable Smart Execution, what are the prerequisites ? (Maybe this is connected with Issue No.1)

3) The Size of the WBs is quite larger on the Hadoop Cluster. This is not due to replication, and We are using Gzip Compression. Any advice on that (maybe LZO is used as default in Local Mode)?

Looking forward to Your answers and Thanks in advance.

Best Regards,

Aleksandar Razmovski

Aleksandar Razmovski

7 comments

  • Avatar
    Konsta Danyliuk

    Hello Aleksandar.

    1. Without information about exact cluster settings, it is hard to say what exactly might cause slowness in your case. Perhaps it is related to execution engine used or memory settings. We could investigate further if you will provide more information about cluster and Datameer configuration (version, memory allocation, etc.)

    2. In order to use Smart Execution it would be required to get appropriate license which enables this feature.

      Meanwhile you could try different execution frameworks (e.g. Tez), by setting custom property e.g. das.exection-framework=Tez on global level at Hadoop Cluster page or individually for a job.

    3. LZO codec isn't used as default compression in Datameer.

      In order to make fair comparison of workbook's size in local and cluster mode, I would suggest to create Baseline test workbook in different modes and check its size. You could use instruction from the article How to Generate Normal Distributed Random Values to make the workbook for clean test.

    0
  • Avatar
    Aleksandar Razmovski

    Dear,

    About 1)

    I've noticed 1 weird thing:

    [system]  INFO [2016-11-11 09:56:38.877] [ConcurrentJobExecutor-0] (JdbcSplitter.java:71) - SplitHint{numMapTasks=149, minSplitSize=0, maxSplitSize=9223372036854775807, minSplitCount=0, maxSplitCount=4}

    While the setting in mapred-site are:

    <property>
    <name>mapreduce.input.fileinputformat.split.maxsize</name>
    <value>536870912</value>
    </property>
    <property>
    <name>mapreduce.input.fileinputformat.split.minsize</name>
    <value>134217728</value>
    </property>
    <property>
    <name>mapreduce.job.max.split.locations</name>
    <value>5</value>
    </property>

    Where does Datameer get the above numbers (in the Split Hint)?

    Thanks in Advance 

    Best Regards,

    Aleksandar Razmovski

    0
  • Avatar
    Aleksandar Razmovski

    Dear,

    Also the containers RAM allocated is never surpassing 2GB, altgough the settings show differently:

        dfs.datanode.data.dir = /localservices/hdfs_data
        yarn.scheduler.capacity.maximum-am-resource-percent = 20
        yarn.nodemanager.vmem-check-enabled = false
        dfs.permissions.superusergroup = hdfs
        mapreduce.map.cpu.vcores = 5
        mapreduce.map.speculative = false
        mapreduce.output.fileoutputformat.compress = true
        mapreduce.task.io.sort.mb = 1024
        mapreduce.reduce.cpu.vcores = 5
        yarn.scheduler.minimum-allocation-vcores = 1
        mapreduce.reduce.memory.mb = 4096
        dfs.namenode.checkpoint.dir = /localservices/secondary1,/localservices/secondary2
        yarn.nodemanager.local-dirs = /localservices/yarn-local
        mapreduce.job.max.split.locations = 5
        yarn.resourcemanager.address = ${yarn.resourcemanager.hostname}:8032
        yarn.scheduler.increment-allocation-mb = 512
        mapreduce.map.output.compress = true
        hadoop.tmp.dir = /localservices/tmp
        yarn.nodemanager.vmem-pmem-ratio = 2.1
        yarn.application.classpath =  $HADOOP_CONF_DIR,/usr/lib/hadoop-yarn/*,/usr/lib/hadoop-mapreduce/*,/usr/lib/hadoop-hdfs/*,/usr/lib/*,/usr/lib/hadoop-hdfs/lib/*,/usr/lib/hadoop-mapreduce/lib/*,/usr/lib/hadoop-yarn/lib/*,/usr/lib/hadoop/*,/usr/lib/hadoop/lib/*
        dfs.namenode.name.dir = /localservices/hadoopname1,/localservices/hadoopname2
        fs.tachyon.impl = tachyon.hadoop.TFS
        yarn.nodemanager.resource.memory-mb = 12288
        yarn.nodemanager.resource.cpu-vcores = 7
        mapred.child.java.opts = -Xmx3277m
        yarn.scheduler.minimum-allocation-mb = 512
        mapreduce.framework.name = yarn
        dfs.blocksize = 134217728
        mapreduce.input.fileinputformat.split.minsize = 134217728
        mapreduce.reduce.java.opts = -Xmx3277m
        mapreduce.map.java.opts = -Xmx3277m
        mapreduce.input.fileinputformat.split.maxsize = 536870912
        yarn.resourcemanager.hostname = hdp2.carrierzone.com
        yarn.scheduler.maximum-allocation-mb = 4096
        io.compression.codecs = org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec
        yarn.scheduler.maximum-allocation-vcores = 5
        yarn.nodemanager.aux-services = mapreduce_shuffle
        mapreduce.map.memory.mb = 4096
        yarn.app.mapreduce.am.command-opts = -Xmx922m
        yarn.nodemanager.aux-services.mapreduce_shuffle.class = org.apache.hadoop.mapred.ShuffleHandler
        yarn.app.mapreduce.am.resource.mb = 1024
        mapreduce.job.ubertask.enable = true
        dfs.replication = 2
        yarn.app.mapreduce.am.resource.cpu-vcores = 1
        yarn.resourcemanager.scheduler.class = org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
        fs.defaultFS = hdfs://hdp1:8020/
        mapreduce.output.fileoutputformat.compress.codec = org.apache.hadoop.io.compress.GzipCodec
        mapreduce.map.output.compress.codec = org.apache.hadoop.io.compress.GzipCodec
        mapreduce.reduce.speculative = false
        JVM Version = Java HotSpot(TM) 64-Bit Server VM, 1.7 (Oracle Corporation)
        JVM Opts = -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx1638m -Djava.io.tmpdir=/localservices/yarn-local/usercache/datameer/appcache/application_1479209793677_0005/container_1479209793677_0005_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1479209793677_0005/container_1479209793677_0005_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog

     

    Best Regards,

    Aleksandar Razmovski

    0
  • Avatar
    Gido

    Hi Aleksandar,

    As of Datameer version 5.6 we use new custom properties as an abstraction layer for several Execution Frameworks.

    If you like to configure memory related properties you can change the default settings which are currently: 

    das.job.map-task.memory=2048 
    das.job.reduce-task.memory=2048
    das.job.application-manager.memory=2048

     

    e.g. to 

    das.job.map-task.memory=4096
    das.job.reduce-task.memory=4096
    das.job.application-manager.memory=4096

    To determine the memory configuration settings you may have also a look into distribution specific documentation like from Cloudera or Hortonworks

    Additionally remove Execution Framework specific parameters from Datameer's configuration e.g.:

    mapred.map.child.java.opts=-Xmx<value>m 
    mapred.reduce.child.java.opts=-Xmx<value>m
    mapred.job.map.memory.mb=<value>
    mapred.job.reduce.memory.mb=<value>

     

    0
  • Avatar
    Aleksandar Razmovski

    Dear,

    That explains a lot. Thanks

    Btw, can I set this custom properties on job/task level (WorkBook, ImportJob....)?

     

    Best Regards,

    Aleksandar Razmovski

    0
  • Avatar
    Konsta Danyliuk

    Hello Aleksandar.

    You could set memory settings at job level, just mention required parameters at artefact's Custom Properties field (it is usually located under Advanced section at artefact's configuration).

    0
Please sign in to leave a comment.