Some folders need very long time to list in file browser

2 followers
0
Avatar

I have some folders that need up to 30 seconds to be listed in the file browser of Datameer 7.1.3. Any idea what could be the reason?

Michael Ahn

11 comments

  • Avatar
    Michael Ahn

    I seperated the objects now by type in different directories. The directory containing only 10 export jobs is still slow, all others are fast. Getting the status of the export jobs seems to cause the slow directory listing.

    0
  • Avatar
    Alan Mark

    Hi Michael,

    I've got a folder with 12 export jobs in my lab environment on 7.1.4 and it lists equally fast as my other folders containing workbooks, or other artifact types.

    Can you tell us a little more about the export jobs themselves?  Where are they exporting to?  Are they using JDBC or are they using a native connector of some kind?  If JDBC, what driver specifically?  It seems like there must be something in particular with either the environment or with the export jobs themselves - so I'd like to try and better replicate your artifacts.

    0
  • Avatar
    Michael Ahn

    Hi Alan,

     

    all  these 10 exports do an scp to a linux server. All are using the same connection:

    {
    "version": "7.1.3",
    "className": "datameer.dap.sdk.entity.CustomDataStore",
    "file": {
    "uuid": "15fefd67-293f-4eb6-93b5-3276d2457c8b",
    "path": "/Data/Connections/bdrx1.dst",
    "description": "SCP auf bdrx1",
    "name": "bdrx1"
    },
    "typeId": "das.ScpFileDataStoreType",
    "properties": {
    "GenericConfigurationImpl.temp-file-store": [
    "609f972c-e9dc-49af-92ec-4ebd02933507"
    ],
    "dataStoreTemplate": [
    "false"
    ],
    "dataStoreUsage": [
    "IMPORT_EXPORT"
    ],
    "host": [
    "192.168.239.131"
    ],
    "password": [
    "SECURE:0:xxx"
    ],
    "port": [
    "22"
    ],
    "rootPathPrefix": [
    "/"
    ],
    "sshKey": [
    "SECURE:0:xxx"
    ],
    "user_name": [
    "root"
    ]
    }
    }
     

    One example:

    {
      "version": "7.1.3",
      "className": "datameer.dap.common.entity.FileDataSinkImpl",
      "file": {
        "uuid": "a2300ed5-3db9-492f-adec-c35c41e10405",
        "path": "/Sentiment/Skydeck/Hide/people_CitiesExport_export.exp",
        "description": "Export of located tweets by time for Skydeck",
        "name": "people_CitiesExport_export"
      },
      "pullType": "WHEN_NEW_DATA_COMES_IN",
      "minKeepCount": 1,
      "properties": {
        "character_encoding": [
          "UTF-8"
        ],
        "delimiter": [
          "\\t"
        ],
        "escapeCharacter": [
          "\\"
        ],
        "filename": [
          "/basedirs/skydeck/wwwdocs/html/Skydeck_people.tsv"
        ],
        "header": [
          "true"
        ],
        "quoteCharacter": [
          ""
        ],
        "replace_data": [
          "true"
        ]
      },
      "hadoopProperties": "das.splitting.max-split-count\u003d1",
      "connection": "/Data/Connections/bdrx1",
      "sheet": {
        "name": "CitiesExport",
        "sheetId": "dff914a2-9b1a-4407-97a6-e62ba79712e7",
        "workbook": {
          "path": "/Sentiment/Skydeck/people_tweet_analysis.wbk",
          "uuid": "dc6eb790-d11b-44e1-b285-f8eaabd28751"
        }
      },
      "mappings": [
        {
          "name": "location",
          "srcColumnIndex": 0,
          "nullable": true
        },
        {
          "name": "Created",
          "srcColumnIndex": 1,
          "nullable": true,
          "pattern": "yyyy-MM-dd HH:mm:ss"
        },
        {
          "name": "Count",
          "srcColumnIndex": 2,
          "nullable": true
        },
        {
          "name": "Average_Polarity",
          "srcColumnIndex": 3,
          "nullable": true
        },
        {
          "name": "Count_positive",
          "srcColumnIndex": 4,
          "nullable": true
        },
        {
          "name": "Count_negative",
          "srcColumnIndex": 5,
          "nullable": true
        },
        {
          "name": "Count_neutral",
          "srcColumnIndex": 6,
          "nullable": true
        },
        {
          "name": "Percent_positive",
          "srcColumnIndex": 7,
          "nullable": true
        },
        {
          "name": "Percent_negative",
          "srcColumnIndex": 8,
          "nullable": true
        },
        {
          "name": "Percent_neutral",
          "srcColumnIndex": 9,
          "nullable": true
        },
        {
          "name": "lat",
          "srcColumnIndex": 10,
          "nullable": true
        },
        {
          "name": "lon",
          "srcColumnIndex": 11,
          "nullable": true
        },
        {
          "name": "Bundesland",
          "srcColumnIndex": 12,
          "nullable": true
        },
        {
          "name": "Land",
          "srcColumnIndex": 13,
          "nullable": true
        }
      ],
      "errorHandlingMode": "DROP_RECORD",
      "notificationAddresses": "",
      "notificationSuccessAddresses": "",
      "confirmOverwrite": false,
      "exportFileType": "datameer.dap.common.csv.CsvExportFileType"
    }

     

    0
  • Avatar
    Alan Mark

    Hi Michael,

    I recreated a similar structure using a single connection and multiple export jobs over SCP to my host.  Unfortunately the behavior did not reproduce for me.

    Is this happening with other folders containing export jobs?  I imagine you must have more than one folder containing jobs for exporting data.

    If you move these to a different folder - does that folder start having the problem?

    Have you tried accessing this folder from an incognito window to rule out a possible browser cache issue?  Or from a different browser?

     

    0
  • Avatar
    Michael Ahn

    I don't have the problem with export jobs in general. The jobs causing the problem are executed every 15 minutes and I keep job execution logs for 4 days. Maybe that is causing the slow answer time. I have another folder containing 11 similar export jobs which are called only once a day and there I have no delay when browsing it.

    0
  • Avatar
    Alan Mark

    Hi Michael,

    I discussed with the engineers and they mentioned that the record bars are calculated each time you switch to a folder.

    Do you really need to keep as many of these runs as you are keeping?  4 runs per hour * 24 hours * 4 days is 384 per export job.  If one of the exports has a lot of records, this could be the reason why the delay is present.

    Could you please check which export has the largest quantity of records, move it out of this folder, and test to see if that speeds things up for you?

    0
  • Avatar
    Michael Ahn

    Hi Alan,

    I certainly do not need any of the historic job runs for these export jobs. But can I set this just for these jobs?
    Moving the 2 exports with the largest number of records (100000 each) to a new directory does not improve the answer times.
    I see no reason why the number of records have anything to do with the time to show It in the browser.
    But sometimes the answer times are fast, when I toggle the view between the two directories now containing these exports.
    Maybe it is an issue with the mysql database behind, so only when the result comes from the cache it is fast. Here are the dap statistics:

    mysql> SELECT TABLE_NAME, table_rows, data_length, index_length,
        -> round(((data_length + index_length) / 1024 / 1024),2) 'SizeMB'
        -> FROM information_schema.TABLES
        -> WHERE table_schema = 'dap';
    +------------------------------------+------------+-------------+--------------+--------+
    | TABLE_NAME                         | table_rows | data_length | index_length | SizeMB |
    +------------------------------------+------------+-------------+--------------+--------+
    | accept_terms                       |         34 |       16384 |        16384 |   0.03 |
    | access_token                       |          0 |       16384 |        16384 |   0.03 |
    | application                        |         41 |       81920 |        32768 |   0.11 |
    | artifact                           |        215 |       49152 |        16384 |   0.06 |
    | authenticable_group                |          0 |       16384 |        16384 |   0.03 |
    | connection_test_table              |          0 |           0 |         1024 |   0.00 |
    | custom_data_sink                   |         14 |       16384 |        16384 |   0.03 |
    | custom_role                        |          2 |       16384 |        16384 |   0.03 |
    | customrole_auth_group              |          0 |       16384 |        32768 |   0.05 |
    | customrole_user                    |         40 |       16384 |        32768 |   0.05 |
    | daily_export_job_statistics        |      82256 |     5783552 |      4734976 |  10.03 |
    | daily_import_job_statistics        |      44463 |     3686400 |      2637824 |   6.03 |
    | daily_workbook_statistics          |      43614 |     4734976 |      2637824 |   7.03 |
    | dap_file                           |       3203 |     1589248 |       983040 |   2.45 |
    | dap_job_configuration              |       2497 |      212992 |       163840 |   0.36 |
    | dap_job_execution                  |      19112 |     8404992 |      2555904 |  10.45 |
    | dap_job_execution_dap_job_counter  |      73306 |     6307840 |      3162112 |   9.03 |
    | data                               |     955236 |   190398464 |    122372096 | 298.28 |
    | data_base_data_sink                |          1 |       16384 |        16384 |   0.03 |
    | data_group_permission              |          2 |       16384 |        32768 |   0.05 |
    | data_migration                     |          0 |       16384 |        49152 |   0.06 |
    | data_mining_configuration          |         93 |       16384 |            0 |   0.02 |
    | data_mining_configuration_property |       6202 |     1572864 |       229376 |   1.72 |
    | data_mining_model                  |         52 |       16384 |        16384 |   0.03 |
    | data_mining_model_property         |       4476 |      458752 |       131072 |   0.56 |
    | data_partition                     |       3744 |      393216 |       114688 |   0.48 |
    | data_permission                    |       1615 |       65536 |            0 |   0.06 |
    | data_sink_configuration            |        223 |       16384 |        49152 |   0.06 |
    | data_source_configuration          |        699 |       81920 |        32768 |   0.11 |
    | data_source_data                   |      16684 |   253247488 |       278528 | 241.78 |
    | data_store                         |        147 |       16384 |        49152 |   0.06 |
    | data_store_configuration_property  |       1133 |      147456 |        81920 |   0.22 |
    | data_volume_summary                |      44851 |     2637824 |      3686400 |   6.03 |
    | db_driver                          |          7 |       16384 |        16384 |   0.03 |
    | db_driver_jar_file_names           |          7 |       16384 |        16384 |   0.03 |
    | extension_point_state              |          4 |       16384 |            0 |   0.02 |
    | extension_state                    |         15 |       16384 |            0 |   0.02 |
    | field                              |      26180 |     3686400 |      1589248 |   5.03 |
    | file_data_sink                     |        208 |       16384 |        16384 |   0.03 |
    | file_dependency                    |       3603 |     2637824 |       311296 |   2.81 |
    | filesystem_artifact_to_delete      |          2 |       16384 |        16384 |   0.03 |
    | filter                             |       2257 |      114688 |        65536 |   0.17 |
    | filter_argument                    |       2899 |      294912 |       131072 |   0.41 |
    | folder                             |        481 |       81920 |       114688 |   0.19 |
    | folder_group_permission            |         37 |       16384 |        32768 |   0.05 |
    | folder_permission                  |        439 |       32768 |            0 |   0.03 |
    | formula                            |      70239 |     6832128 |      6324224 |  12.55 |
    | group_permission                   |        241 |       16384 |        32768 |   0.05 |
    | infographic_model                  |        514 |    34095104 |        32768 |  32.55 |
    | installation_history               |         16 |       16384 |        16384 |   0.03 |
    | job_configuration_property         |      11116 |     1589248 |       409600 |   1.91 |
    | mapping                            |       5155 |      311296 |       344064 |   0.63 |
    | optional_upgrade                   |         89 |       16384 |        16384 |   0.03 |
    | partition_index                    |       2714 |       98304 |            0 |   0.09 |
    | permission                         |       3202 |      163840 |            0 |   0.16 |
    | plugin_state                       |          4 |       16384 |            0 |   0.02 |
    | property                           |         43 |       16384 |        16384 |   0.03 |
    | role_capability                    |         90 |       16384 |        16384 |   0.03 |
    | sheet                              |       7910 |    48840704 |      3358720 |  49.78 |
    | sheet_field                        |          0 |       16384 |        32768 |   0.05 |
    | sheet_viewstate                    |      11029 |      507904 |            0 |   0.48 |
    | sink_data                          |     916635 |    34144256 |     13123584 |  45.08 |
    | sort                               |       2671 |      376832 |       114688 |   0.47 |
    | stored_file                        |         89 |       16384 |        32768 |   0.05 |
    | system_job_configuration           |          1 |       16384 |        16384 |   0.03 |
    | system_job_data                    |          1 |       16384 |        16384 |   0.03 |
    | temporary_data                     |          0 |       16384 |        32768 |   0.05 |
    | test_entity2                       |          0 |       16384 |            0 |   0.02 |
    | user                               |         36 |       16384 |        16384 |   0.03 |
    | user_group                         |          9 |       16384 |        16384 |   0.03 |
    | usergroup_user                     |         38 |       16384 |        32768 |   0.05 |
    | workbook_configuration             |       1675 |      114688 |       180224 |   0.28 |
    | workbook_data                      |       3543 |      163840 |        98304 |   0.25 |
    | workbook_sheet_data                |     102502 |   440025088 |      4227072 | 423.67 |
    | workbook_viewstate                 |       1051 |       81920 |            0 |   0.08 |
    +------------------------------------+------------+-------------+--------------+--------+
    75 rows in set (0.08 sec)
    0
  • Avatar
    Alan Mark

    Hi Michael,

    According to the engineers, the bars shown for record count in the file browser is recalculated whenever a folder is loaded.  Although I will concede there may be some browser caching going on causing the intermittent fast load times.

    When you mentioned the 15 minute 4 run retention policy earlier - was the on the workbooks driving these exports? Or Housekeeping?  Or were you talking about an option on the export jobs?  

    I'll definitely keep working this here on the community with you, but I'd like to encourage you to use zendesk for your tickets going forward.  The things you find tend to be more complex than the forum is intended for.  

    0
  • Avatar
    Michael Ahn

    The export jobs are automatically triggered by workbooks running every 15 minutes. I have 4 days retention for housekeeping. I switched one of the workbooks to daily execution before the weekend, so only 4 job runs are left for the workbook and the three dependant export jobs. I moved these three exports to a new folder and this folder now lists fast.

    0
  • Avatar
    Alan Mark

    Hi Michael,

    So it seems the engineers were right that this was being caused by the recalculation of the record count bars shown in the file browser - since the number of times they're calculated would be reduced by the reduced number of executions.

    I'm discussing with them how they want to move forward with this.  I'll post back once I have more information.

    0
Please sign in to leave a comment.