Spark Archives - BigDATAwire

The Spark-to-Ray Migration That Will Save Amazon $100M+ Per Year

When Ray first emerged from the UC Berkeley RISELab back in 2017, it was positioned as a possible replacement for Apache Spark. But as Anyscale, the commercial outfit behind Ray, scaled up its own operations, the “Ray Read more…

Apache Arrow Announces DataFusion Comet

Apache Arrow, a software development platform for building high-performance applications, has announced the donation of the Comet project. Comet is an Apache Spark plugin that uses Apache Arrow Datafusion to impro Read more…

There Are Many Paths to the Data Lakehouse. Choose Wisely

You don’t need a crystal ball to see that the data lakehouse is the future. At some point soon, it will be the default way of interacting with data, combining scale with cost-effectiveness. Also easy to predict is t Read more…

Cloudera Sees Iceberg Everywhere

Cloudera gave its hybrid cloud customers a big boost today when it announced on-prem support for the Apache Iceberg table format. The move gives customers the capability to access and process on-prem data with any Iceber Read more…

HPE Brings Analytics Together on its Data Fabric

HPE today unveiled a major update to its Ezmeral software platform, which previously included over a dozen components but now includes just two, including the Ezmeral Data Fabric that provides edge-to-cloud data manageme Read more…

IBM Embraces Iceberg, Presto in New Watsonx Data Lakehouse

IBM yesterday unveiled watsonx.data, a new data lakehouse offering for cloud and on-prem that will use object storage and Apache Iceberg, an open data format. Big Blue launched two other offerings in the new watsonx fami Read more…

NeuroBlade Seeks Controlled Growth for Big Data Bottleneck-Buster

Early adopters of NeuroBlade’s processing-in-memory (PIM) architecture, called XRAM, are showing a 10x to 60X boost in throughput for big SQL workloads. But the company is playing it safe on the growth front, so don’ Read more…

AWS Bolsters Glue ETL Tool with Data Observability, Ray Support

AWS has made a big push into data management during re:Invent this week, with the unveiling of DataZone and launch of zero-ETL capabilities in Redshift. But AWS also bolstered its ETL tool with the launch of Amazon Glue Read more…

Esri Melds GIS with AI, Graph, and Analytics

Esri has long been the industry leader in geographic information systems (GIS), which are used by urban planners, building engineers, and landscape designers around the world. At its UC 2022 conference this week, the com Read more…

NetApp Spots a Data Platform Opportunity in the Cloud

The market for spot instances in the cloud is, well, spotty. Some days you can count on a 90% discount by buying excess capacity from the public clouds, and other times you can't. NetApp has turned that cloud unpredictab Read more…

EMR Serverless Now Available from AWS

Amazon EMR, which ostensibly is the world’s most popular hosted Hadoop environment, is now generally available as a serverless offering, AWS announced today. Amazon EMR Serverless will save customers time and money Read more…

In Search of the Data Dream Team

When it comes to succeeding at big data, the people you put in place are just as important--if not more important--than the products and technologies you use. One of the folks exploring the intersection of people and dat Read more…

Kubernetes Adoption Widespread for Big Data, But Monitoring and Tuning Are Issues, Survey Finds

Kubernetes may be a complex piece of software that can be difficult to monitor and manage. But the benefits of running applications in the popular container orchestration system appear to outweigh the disadvantages, beca Read more…

Alluxio Nabs $50M, Preps for Growth in Data Orchestration

Data orchestration software provider Alluxio today announced the close of an oversubscribed $50-million Series C round, which its CEO plans to spend on a global expansion. It also launched version 2.7 of its software, wh Read more…

Aerospike Turbocharges Spark ML Training with Pushdown Processing

Companies that need to access a lot of data in a hurry, such as retraining a machine learning model in Spark, have traditionally had to move that data from the edge to a central repository, such as a cloud data lake. But Read more…

Informatica Accelerates DataOps with Spark, GPUs

Informatica today announced that customers can see up to a 5x performance boost for ETL and data management workloads when they run them under its new cloud-based data integration engine that’s powered by Apache Spark Read more…

Prophecy Spins Up Low-Code Data Pipeline Tool

In recent years, the shortage of data engineers has at times exceeded the shortage of data scientists. To help close the gap, a Silicon Valley startup called Prophecy today unveiled a low-code data engineering tool that Read more…

LinkedIn’s Translation Engine Linked to Presto

An SQL translation engine unveiled this week by LinkedIn is integrated with other open-source SQL query engines like Presto in a combination aimed at bulging data lakes. The Microsoft unit’s Coral engine handles ana Read more…

Data Exchange Maker Harbr Closes Series A

Harbr, a London startup that helps organizations like Moody’s Analytics to create their own custom data exchanges, yesterday announced that it has completed a Series A round of financing, netting $38.5 million for the Read more…

The Past and Future of In-Memory Computing

When Nikita Ivanov co-founded GridGain Systems back in 2005, he envisioned in-memory computing going mainstream and becoming a massive category unto itself within a few years. That obviously didn’t pan out, but on the Read more…

The Spark-to-Ray Migration That Will Save Amazon $100M+ Per Year

Apache Arrow Announces DataFusion Comet

There Are Many Paths to the Data Lakehouse. Choose Wisely

Cloudera Sees Iceberg Everywhere

HPE Brings Analytics Together on its Data Fabric

IBM Embraces Iceberg, Presto in New Watsonx Data Lakehouse

NeuroBlade Seeks Controlled Growth for Big Data Bottleneck-Buster

AWS Bolsters Glue ETL Tool with Data Observability, Ray Support

Esri Melds GIS with AI, Graph, and Analytics

EMR Serverless Now Available from AWS

In Search of the Data Dream Team

Kubernetes Adoption Widespread for Big Data, But Monitoring and Tuning Are Issues, Survey Finds

Alluxio Nabs $50M, Preps for Growth in Data Orchestration

Aerospike Turbocharges Spark ML Training with Pushdown Processing

Informatica Accelerates DataOps with Spark, GPUs

Prophecy Spins Up Low-Code Data Pipeline Tool

LinkedIn’s Translation Engine Linked to Presto

The Past and Future of In-Memory Computing

January 27, 2025

January 24, 2025

January 23, 2025

January 22, 2025

Sponsored Partner Content

CData recognized in the 2024 Gartner ® Magic Quadrant™ Report

Introducing AIStor, the most powerful version of MinIO to date

Designing a Copilot for Data Transformation

Get your Data AI Ready – Celebrate One Year of Deep Dish Data Virtual Series!

Supercharge Your Data Lake with Spark 3.3

Leading Solution Providers

Tabor Network

Sponsored Whitepapers

Transforming Healthcare with Data

IDC Spotlight: Boosting AI Impact with Data Products

Sponsored Multimedia

The Power of DataOps: Bring Automation to Life
No Comments

Tactical Steps for Cloud Migration
No Comments

Immuta Data Access Platform
No Comments

Data Mesh: Fact or Fiction?
No Comments

Contributors

Tag: Spark