Sunday, May 3, 2020

Oracle Autonomous Database : Delete files from DATA_PUMP_DIR

To delete a single file from DATA_PUMP_DIR, we need to first know the name of the file.

The below statement can be used to list down all the files avialable in the DATA_PUMP_DIR

select * from DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR');

Now, copy the filename which you want to delete and use below command to delete it

BEGIN
UTL_FILE.FREMOVE('DATA_PUMP_DIR','FILE_NAME');
COMMIT;
END:
/

There can be a case where you want to delete all the files from the DATA_PUMP_DIR or any other directory defined in Oracle Database / ADB. Use the below to delete all files in one shot.

BEGIN
FOR r IN (SELECT object_name FROM DBMS_CLOUD.LIST_FILES('DATA_PUMP_DIR'))
LOOP
UTL_FILE.FREMOVE ('DATA_PUMP_DIR',r.object_name );
END LOOP;
COMMIT;
END;
/

We can modify the select statement to filter for only the files to be deleted for example to meet a specific criteria like .log or .dmp etc.

Tuesday, April 21, 2020

Oracle Big Data SQL aka Cloud SQL

The post is intended to outline the Oracle Big Data SQL offering, more of an introduction to what it is and where can we find further information.

Introduction and terminology


Oracle Big Data SQL leverages Oracle cell smart-scan technology to fetch a large amount of data defined as an Oracle external table using Oracle Big Data SQL drivers (oracle_hdfs, oracle_hive, and for Big Data SQL (aka. BDSQL) 4.0, oracle_bigdata)

An Oracle query that involves such an external table triggers a describe call from an Oracle database foreground process to access metadata (from hadoop Hive Metastore / namenode) which includes granules (i.e., data blocks ) and location information. The granules are then batched and assigned to eligible multiple Parallel Queries Slaves, PQs (or a foreground process if a serialized query) which interface with Big Data SQL (or BDSQL) cells running in data nodes. There can be hundreds or thousands of granules involved in a single query. A query involving granules in all data nodes will be able to leverage all Big Data SQL cells running in data nodes, thus scale (i.e., parallelize) the query the best, and get the most benefits. After the data is fetched to the cell, the data can be filtered before sending back to the database, thus shortening the overall query elapsed time significantly.

Retries/Fallback


There are three replicas of each granule by default, each in different datanode. Each granule has three datanodes assigned, thus there will be three attempts to fetch each granule. If all of the three attempts (aka. retries) fail, the granule will have to fetched directly from the database (aka fallback mode), in which case, data will not be able to be filtered (impacting performance).

A small number of 'retries count' might not represent a problem. In any event, if the number keeps increasing, it indicates either the Big Data SQL cell memory is not large enough to handle the workload, or there are code issues / configuration issues.

Quarantines


With any internal error (i.e., ORA-600) occurs on any offload server thread, a quarantine is created in the Big Data SQL cell. The quarantine has the SQLID stored along with other attributes. Once a quarantine is created in a Big Data SQL cell, the same SQL (with the same SQLID) will not be allowed to be processed in that cell. The basic idea is that the same SQL will very likely cause the same internal error. Once there are three quarantines (within a certain time range), the Big Data SQL cell will not accept any further SQL from the database.

Each quarantine will cause an incident trace created in:/opt/oracle/bd_cell/log/diag/bdsql/cell/SYS*/incident/

Below is the working Architecture of Big Data SQL , which I will try to explain in later post.



Documentation 


          User Guide 

          Installation Guide